On Monday, 15 July 2013 at 22:29:14 UTC, H. S. Teoh wrote:
I consider myself to be a "systematic" programmer (according to
the
definition in the paper); I can work equally well with ctors
with
arguments vs. create-set-call objects. But I find that
mandatory ctors
with arguments are a pain to work with, *both* to write and to
use.
I also find constructors with multiple arguments a pain to use.
They get difficult to maintain as your project grows. One of my
pet projects has a very shallow class hierarchy, but the
constructors of each object down the tree have many arguments,
with descendants adding on even more. It gets to be a real
headache when you have more than 3 constructors per class to deal
with base class overloads, multiple arguments, etc.
On the usability side, there's the mental workload of having to
remember
which order the arguments appear in (or look it up in the IDE,
or
whatever -- the point is that I can't just type the ctor call
straight
from my head). Then there's the problem of needing to create
objects
required by the ctor before you can call the ctor. In some
cases, this
can be inconvenient -- I always have to remember to setup and
create
other objects before I can create this one, because its ctor
requires
said objects as arguments. Then there's the lack of
flexibility: no
matter what you do, it seems that anything that requires more
than a
single ctor argument inevitably becomes either (1) too complex,
requiring too many arguments, and therefore very difficult to
use, or
(2) too simplistic, and therefore unable to do some things that
I may
want to do (e.g. some fields are default-initialized with no
way to
specify the initial values of the fields, 'cos otherwise the
ctor would
have too many arguments). No matter what you do, it seems almost
impossible to come up with an ideal ctor except in trivial
cases where
it requires only 1 argument or is a default ctor.
Having to create other objects to pass to a constructor is
particularly painful. You'd better pray that they have trivial
constructors, or else things can get hairy really fast. Multiple
nested constructors can also create a large amount of code bloat.
Once the constructor grows large enough, I generally put each
argument on its own line to ensure that it's clear what I'm
calling it with. This has the unfortunate side effect of making
the call span multiple lines. In my opinion, a constructor
requiring more than 10 lines is an unsightly abomination.
On the writability side, one of my pet peeves is base class
ctors that
require multiple arguments. Every level of inheritance
inevitably adds
more arguments each time, and by the time you're 5-6 levels
down the
class hierarchy, your ctor calls just have an unmanageable
number of
parameters. Not to mention the violation of DRY by requiring
much
redundant typing just to pass arguments from the inherited
class' ctor
up the class hierarchy. Tons of bugs to be had everywhere,
given the
amount of repeated typing needed.
In the simplest cases, of course, these aren't big issues, but
this kind
of ctor design is clearly not scalable.
OTOH, the create-set-call pattern isn't panacea either. One of
the
biggest problems with this pattern is that you can't guarantee
your
objects are in a consistent state at all times. This is very
bad,
because all your methods will have to check if some value has
been set
yet, before it uses it. This adds a lot of complexity that
could've been
avoided had everything been set at ctor-time. This also makes
class
invariants needlessly complex. Moreover, I've seen many classes
in this
category exhibit undefined behaviour if you call a
value-setting method
after you start using the object. Too many classes falsely
assume that
you will always call set methods and then "use" methods in that
order.
If you call a set method after calling a "use" method, you're
quite
likely to run into bugs in the class, e.g. part of the object's
state
doesn't reflect the new value you set, because the "use"
methods were
written with the assumption that when they were called the
first time,
the values you set earlier won't change thereafter.
I've found that a good way to keep constructors manageable is to
use the builder pattern. Create a builder object that has its
fields set by the programmer, which is then passed to the 'real'
object for construction. You can provide default arguments,
optional arguments, etc. Combine this with a fluid interface and
I think it looks a lot better. Of course, this has the
disadvantage of requiring a *lot* of boilerplate, but I think
this could be okay in D, as a builder class is exactly the kind
of thing that can be automatically generated.
I've always found Perl's approach a more balanced way to tackle
this
problem (even though Perl's OO system as a whole suffers from
other,
shall we say, idiosyncrasies). In Perl, objects start out as
arbitrary
key-value pairs, and nothing differentiates them from a regular
AA until
you call the 'bless' built-in function on them, at which point
they
become "officially" a member of some particular class. This
neatly
sidesteps the whole ctor mess: you can initialize the initial
AA with
whatever values you want, in whatever order you want. When you
finally
"kicked it into shape", as the cited paper puts it, you
"promote" that
set of key-value pairs into an "official" member of the class,
and
thereafter, you can't simply modify fields anymore except
through class
methods. This means you now have the possibility of enforcing
invariants
on the object without crippling the flexibility of constructing
it.
(Well, OK, in Perl, this last bit isn't necessarily true, but
in an
ideal implementation of this initialize-bless-use approach, the
object's
fields would become non-public after being blessed and can only
be
updated by "official" object methods.)
In the spirit of this approach, I've written some C++ code in
the past
that looked something like this:
class BaseClass {
public:
// Encapsulate ctor arguments
struct Args {
int baseparm1, baseparm2;
};
BaseClass(Args args) {
// initialize object based on fields in
// BaseClass::Args.
}
};
class MyClass : public BaseClass {
public:
// Encapsulate ctor arguments
struct Args : BaseClass::Args {
int parm1, parm2;
};
MyClass(Args args) : BaseClass(args) {
// initialize object based on fields in args
}
};
Basically, the Args structs let the user set up whatever values
they
want to, in whatever order they wish, then they are "blessed"
into real
class instances by the ctor. Encapsulating ctor arguments in
these
structs alleviates the problem of proliferating ctor arguments
as the
class hierarchy grows: each derived class simply hands off the
Args
struct (which is itself in a hierarchy that parallels that of
the
classes) to the base class ctor. All ctors in the class
hierarchy needs
only a single (polymorphic) argument.
This approach also localizes the changes required when you
modify base
class arguments -- in the old way of having multiple ctor
arguments,
adding or changing arguments to the base class ctor requires
you to
update every single derived class ctor accordingly -- very bad.
But
here, adding a new field to BaseClass::Args requires zero
changes to all
derived classes, which is a Good Thing(tm).
In some cases, if the class in relatively simple, the private
members of
the class can simply be themselves an instance of the Args
struct, so
the ctor could be nothing more than just:
MyClass(Args args) : BaseClass(args), myArgs(args) {}
which gets rid of that silly baroque dance of naming ctor
arguments as
_a, _b, _c, then writing in the ctor body a=_a, b=_b, c=_c
(which can be
rather error prone if you mistype a _ somewhere or forget to
assign one
of the members). Since the private copy of Args is not
accessible from
outside, class methods can use the values freely without having
to worry
about inconsistent states -- the ctor can check class
invariants before
creating the class object, ensuring that the internal copy of
Args is in
a consistent state.
The Args structs themselves, of course, can have ctors that
setup sane
default values for each field, so that lazy users can simply
call:
MyClass *obj = new MyClass(MyClass::Args());
and get a working, consistent class object with default
settings. This
way of setting default values also lets the user only change
fields that
they don't want to use default values for, rather than be
constricted by
the order of ctor default arguments: if you're unlucky enough
to need a
non-default value in a later parameter, you're forced to repeat
the
default values for everything that comes before it.
In D, this approach isn't quite as nice, because D structs
don't have
inheritance, so you can't simply pass Args from derived class
to base
class. You'd have to explicitly do something like:
class BaseClass {
public:
struct Args { ... }
this(Args args) { ... }
}
class MyClass {
public:
struct Args {
BaseClass.Args base; // <-- explicit inclusion of
BaseClass.Args
...
}
this(Args args) {
super(args.base); // <-- more verbose than just
super(args);
...
}
}
Initializing the args also isn't as nice, since user code will
have to
know exactly which fields are in .base and which aren't. You
can't just
write, like in C++:
// C++
MyClass::Args args;
args.basefield1 = 123;
args.field2 = 321;
you'd have to write, in D:
// D
MyClass.Args args;
args.base.basefield1 = 123;
args.field2 = 321;
which isn't as nice in terms of encapsulation, since ideally
user code
should need to care about the exact boundaries between base
class and
derived class.
I haven't really thought about how this might be made nicer in
D,
though.
T
See above, this is basically the builder pattern. It's a neat
trick, giving your args objects a class hierarchy of their own. I
think that one drawback of that, however, is that now you have to
maintain *two* class hierarchies. Have you found this to be a
problem in practice?
As an aside, you could probably simulate the inheritance of the
args objects in D either with alias this or even opDispatch.
Still, this means that you need to nest the structs within
each-other, and this could get silly after 2-3 "generations" of
args objects.