Re: Interesting Research Paper on Constructors in OO Languages

Meta Mon, 15 Jul 2013 18:56:15 -0700

On Monday, 15 July 2013 at 22:29:14 UTC, H. S. Teoh wrote:

I consider myself to be a "systematic" programmer (according tothedefinition in the paper); I can work equally well with ctorswitharguments vs. create-set-call objects. But I find thatmandatory ctorswith arguments are a pain to work with, *both* to write and touse.

I also find constructors with multiple arguments a pain to use.They get difficult to maintain as your project grows. One of mypet projects has a very shallow class hierarchy, but theconstructors of each object down the tree have many arguments,with descendants adding on even more. It gets to be a realheadache when you have more than 3 constructors per class to dealwith base class overloads, multiple arguments, etc.

On the usability side, there's the mental workload of having torememberwhich order the arguments appear in (or look it up in the IDE,orwhatever -- the point is that I can't just type the ctor callstraightfrom my head). Then there's the problem of needing to createobjectsrequired by the ctor before you can call the ctor. In somecases, thiscan be inconvenient -- I always have to remember to setup andcreateother objects before I can create this one, because its ctorrequiressaid objects as arguments. Then there's the lack offlexibility: nomatter what you do, it seems that anything that requires morethan a
single ctor argument inevitably becomes either (1) too complex,
requiring too many arguments, and therefore very difficult touse, or(2) too simplistic, and therefore unable to do some things thatI maywant to do (e.g. some fields are default-initialized with noway tospecify the initial values of the fields, 'cos otherwise thector would
have too many arguments). No matter what you do, it seems almost
impossible to come up with an ideal ctor except in trivialcases where
it requires only 1 argument or is a default ctor.

Having to create other objects to pass to a constructor isparticularly painful. You'd better pray that they have trivialconstructors, or else things can get hairy really fast. Multiplenested constructors can also create a large amount of code bloat.Once the constructor grows large enough, I generally put eachargument on its own line to ensure that it's clear what I'mcalling it with. This has the unfortunate side effect of makingthe call span multiple lines. In my opinion, a constructorrequiring more than 10 lines is an unsightly abomination.

On the writability side, one of my pet peeves is base classctors thatrequire multiple arguments. Every level of inheritanceinevitably addsmore arguments each time, and by the time you're 5-6 levelsdown theclass hierarchy, your ctor calls just have an unmanageablenumber ofparameters. Not to mention the violation of DRY by requiringmuchredundant typing just to pass arguments from the inheritedclass' ctorup the class hierarchy. Tons of bugs to be had everywhere,given the
amount of repeated typing needed.
In the simplest cases, of course, these aren't big issues, butthis kind
of ctor design is clearly not scalable.
OTOH, the create-set-call pattern isn't panacea either. One ofthebiggest problems with this pattern is that you can't guaranteeyourobjects are in a consistent state at all times. This is verybad,because all your methods will have to check if some value hasbeen setyet, before it uses it. This adds a lot of complexity thatcould've beenavoided had everything been set at ctor-time. This also makesclassinvariants needlessly complex. Moreover, I've seen many classesin thiscategory exhibit undefined behaviour if you call avalue-setting methodafter you start using the object. Too many classes falselyassume thatyou will always call set methods and then "use" methods in thatorder.If you call a set method after calling a "use" method, you'requitelikely to run into bugs in the class, e.g. part of the object'sstatedoesn't reflect the new value you set, because the "use"methods werewritten with the assumption that when they were called thefirst time,
the values you set earlier won't change thereafter.

I've found that a good way to keep constructors manageable is touse the builder pattern. Create a builder object that has itsfields set by the programmer, which is then passed to the 'real'object for construction. You can provide default arguments,optional arguments, etc. Combine this with a fluid interface andI think it looks a lot better. Of course, this has thedisadvantage of requiring a *lot* of boilerplate, but I thinkthis could be okay in D, as a builder class is exactly the kindof thing that can be automatically generated.

I've always found Perl's approach a more balanced way to tacklethisproblem (even though Perl's OO system as a whole suffers fromother,shall we say, idiosyncrasies). In Perl, objects start out asarbitrarykey-value pairs, and nothing differentiates them from a regularAA untilyou call the 'bless' built-in function on them, at which pointtheybecome "officially" a member of some particular class. Thisneatlysidesteps the whole ctor mess: you can initialize the initialAA withwhatever values you want, in whatever order you want. When youfinally"kicked it into shape", as the cited paper puts it, you"promote" thatset of key-value pairs into an "official" member of the class,andthereafter, you can't simply modify fields anymore exceptthrough classmethods. This means you now have the possibility of enforcinginvariantson the object without crippling the flexibility of constructingit.(Well, OK, in Perl, this last bit isn't necessarily true, butin anideal implementation of this initialize-bless-use approach, theobject'sfields would become non-public after being blessed and can onlybe
updated by "official" object methods.)
In the spirit of this approach, I've written some C++ code inthe past
that looked something like this:

        class BaseClass {
        public:
                // Encapsulate ctor arguments
                struct Args {
                        int baseparm1, baseparm2;
                };
                BaseClass(Args args) {
                        // initialize object based on fields in
                        // BaseClass::Args.
                }
        };

        class MyClass : public BaseClass {
        public:
                // Encapsulate ctor arguments
                struct Args : BaseClass::Args {
                        int parm1, parm2;
                };

                MyClass(Args args) : BaseClass(args) {
                        // initialize object based on fields in args
                }
        };
Basically, the Args structs let the user set up whatever valuestheywant to, in whatever order they wish, then they are "blessed"into realclass instances by the ctor. Encapsulating ctor arguments inthesestructs alleviates the problem of proliferating ctor argumentsas theclass hierarchy grows: each derived class simply hands off theArgsstruct (which is itself in a hierarchy that parallels that oftheclasses) to the base class ctor. All ctors in the classhierarchy needs
only a single (polymorphic) argument.
This approach also localizes the changes required when youmodify baseclass arguments -- in the old way of having multiple ctorarguments,adding or changing arguments to the base class ctor requiresyou toupdate every single derived class ctor accordingly -- very bad.Buthere, adding a new field to BaseClass::Args requires zerochanges to all
derived classes, which is a Good Thing(tm).
In some cases, if the class in relatively simple, the privatemembers ofthe class can simply be themselves an instance of the Argsstruct, so
the ctor could be nothing more than just:

        MyClass(Args args) : BaseClass(args), myArgs(args) {}
which gets rid of that silly baroque dance of naming ctorarguments as_a, _b, _c, then writing in the ctor body a=_a, b=_b, c=_c(which can berather error prone if you mistype a _ somewhere or forget toassign oneof the members). Since the private copy of Args is notaccessible fromoutside, class methods can use the values freely without havingto worryabout inconsistent states -- the ctor can check classinvariants beforecreating the class object, ensuring that the internal copy ofArgs is in
a consistent state.
The Args structs themselves, of course, can have ctors thatsetup sanedefault values for each field, so that lazy users can simplycall:
        MyClass *obj = new MyClass(MyClass::Args());
and get a working, consistent class object with defaultsettings. Thisway of setting default values also lets the user only changefields thatthey don't want to use default values for, rather than beconstricted bythe order of ctor default arguments: if you're unlucky enoughto need anon-default value in a later parameter, you're forced to repeatthe
default values for everything that comes before it.
In D, this approach isn't quite as nice, because D structsdon't haveinheritance, so you can't simply pass Args from derived classto base
class. You'd have to explicitly do something like:

        class BaseClass {
        public:
                struct Args { ...  }
                this(Args args) { ... }
        }

        class MyClass {
        public:
                struct Args {
BaseClass.Args base; // <-- explicit inclusion ofBaseClass.Args
                        ...
                }
                this(Args args) {
                        super(args.base);       // <-- more verbose than just 
super(args);
                        ...
                }
        }
Initializing the args also isn't as nice, since user code willhave toknow exactly which fields are in .base and which aren't. Youcan't just
write, like in C++:

        // C++
        MyClass::Args args;
        args.basefield1 = 123;
        args.field2 = 321;

you'd have to write, in D:

        // D
        MyClass.Args args;
        args.base.basefield1 = 123;
        args.field2 = 321;
which isn't as nice in terms of encapsulation, since ideallyuser codeshould need to care about the exact boundaries between baseclass and
derived class.
I haven't really thought about how this might be made nicer inD,
though.


T

See above, this is basically the builder pattern. It's a neattrick, giving your args objects a class hierarchy of their own. Ithink that one drawback of that, however, is that now you have tomaintain *two* class hierarchies. Have you found this to be aproblem in practice?

As an aside, you could probably simulate the inheritance of theargs objects in D either with alias this or even opDispatch.Still, this means that you need to nest the structs withineach-other, and this could get silly after 2-3 "generations" ofargs objects.

Re: Interesting Research Paper on Constructors in OO Languages

Reply via email to