The other day I saw the green field Haskell discussion on Reddit. Of
course, I could give you a very long list (ha!) of things that I would
change about Haskell. (This would probably begin with expunging
Monad.fail with extreme prejudice...) But there's one particular feature
which is non-trivial enough to be worth discussing here...
Haskell has ADTs. Most of the time, these work great. As I've written in
several other places (but possibly not here), OO languages tend to
factor the problem the other way. That is, if I want a binary tree, an
OO language makes me split the type and all of its operations into three
parts (an abstract base class, a branch subclass, and a leaf subclass).
Adding each new operation requires adding an abstract version of it to
the abstract base class, and putting half of the implementation into
each concrete subclass.
Haskell is factored the other way. The definition of the type goes in
one place. The definition of each operation over it goes in one place.
More can be added at any time. In particular, the OO approach doesn't
let you add new tree operations without altering the source code for the
tree classes. The ADT version *does* let you add new operations -
provided the tree structure is visible.
This is not to say that the OO approach is wrong, of course. The
important thing about a binary tree is that there are exactly two kinds
of nodes - leaves and branches - and there will *always* be exactly two
kinds. That's what makes it a binary tree! [OK, no it isn't, but bare
with me.]
If, instead of a binary tree, or an abstract syntax tree, or some other
fixed, unchanging structure, we wanted to deal with, say, bank
accounts... Well, there are a bazillion kinds of bank account. And more
might be added at any time. Here an ADT is utterly the wrong thing to
do. It would have a bazillion constructors, and all the processing for a
particular bank account type would be split between dozens of seperate
functions. By contrast, an OO language would put each account type and
all its associated processing in the same place - the class definition.
In summary, problems can be factored two ways. ADTs do it one way,
classes do it the other way, and both can be appropriate. OO languages
use classes, but, happily, Haskell has ADTs *and* classes! :-D And not
classes in the same sense as OO classes; the way Haskell does it is
actually better, IMHO.
Anyway, coming back on-topic... ADTs are great, but sometimes they don't
let me easily express exactly what I want. Specifically, sometimes I
have one type which is really a subset of another type, or perhaps a
superset of several times.
You can build subsets using GADTs, after a fashion:
data Foo
data Bar
data Foobar x where
Foobar1 ... :: Foobar Foo
Foobar2 ... :: Foobar Bar
Foobar3 ... :: Foobar x
Foobar4 ... :: Foobar Bar
Foobar5 ... :: Foobar Foo
Now any expression of type Foobar Foo is guaranteed to contain only
Foobar1, Foobar3 or Foobar5, while any expression of type Foobar Bar is
guaranteed to contain only Foobar2, Foobar3 or Foobar4. As the tangle of
sets becomes more complicated, this approach becomes more difficult to
apply.
You can of course build supersets using an even more mundane Haskell
construct. The canonical example is Either. If I want a field that holds
a Foo *or* a Bar, I can trivially write Either Foo Bar.
The trouble is, this entails an extra level of indirection. In order to
access a Foo or a Bar, I now have to strip off the Left or Right
constructor to get to it. To construct a value, I have to stick in the
constructor. Sometimes this is just fine, even desirable. For example, I
recently wrote a function who's type is
spans :: (x - Bool) - [x] - [Either [x] [x]]
Here the Left and Right constructors are actually /telling/ me
something. They convey /information/.
On the other hand, in the case of Either Foo Bar, the constructor tells
me nothing. Consider:
data Foo = Foo1 | Foo2 | Foo3
data Bar = Bar1 | Bar2 | Bar3
foobar :: Either Foo Bar - x
foobar e =
case e of
Left Foo1 - ...
Left Foo2 - ...
Left Foo3 - ...
Right Bar1 - ...
Right Bar2 - ...
Right Bar3 - ...
There is, logically, no particular reason why we shouldn't simplify the
function to just
foobar Foo + Bar - x
foobar fb =
case fb of
Foo1 - ...
Foo2 - ...
Foo3 - ...
Bar1 - ...
Bar2 - ...
Bar3 - ...
The set of possible constructors for Foo and for Bar are disjoint. There
is no possibility of ambiguity here. The type system itself even /uses/
this fact to infer the type of literal constructor applications! And
yet, you cannot write code such as the above. Not in Haskell, anyway.
So what is the use case here? Well, consider for a moment the problem of
parsing Haskell source code. Haskell contains both patters and
expressions. Now actually, it turns out that a