Gentle Haskell-1.3 people,

This message argues for a particular approach to abstract data types in Haskell.

The problem
~~~~~~~~~~~
We have long been a bit unhappy with Haskell's treatment of abstract data types.
It's easy to make a new algebraic data type abstract, just by not exporting its
constructors.  But it ain't so easy when the new abstract type is to be
isomorphic with an existing type.

One could use a new algebraic data type anyway:

        module Stack( Stk, push, pop )
        
        data Stk a = Stk [a]
        
        push x (Stk xs) = Stk (x:xs)
        pop (Stk (x:xs)) = Stk xs

There are two problems here:

        - it is inefficient; there's lots of wrapping and unwrapping
          of the list (representation type) inside the Stk constructor

        - it defines a new type (Stk a) which is *not* isomorphic to the old
          one [a].  The difference is that (Stk a) is lifted: _|_ is distinct
          from (Stk _|_).  Indeed this difference leads directly to the
          implementation cost mentioned above.

One might argue that all uses of the Stk constructor clutter things up;
but on the other hand they *are* helpful in making explicit just where the
injections into the ADT and projections out of it are occurring.

A bad solution
~~~~~~~~~~~~~~
An approach favoured by some is to try to extend type synonyms.
After all, we usually use type synonyms when we want a name for an existing type.

        module Stack( Stk, push, pop )
        
        type Stk a = [a]
        push = (:)
        pop = tail

The above code is currently illegal: type synonyms must be exported with their
full definition, thus module Stack( Stk(..), push, pop )
But you get the idea.

But this approach has obvious difficulties: what is the exported type of push?

        push :: a -> [a] -> [a]
or      push :: a -> Stk a -> [a]
or      push :: a -> [a] -> Stk a
or      push :: a -> Stk a -> Stk a

Well, one can just about agree that if there is an explicit type signature for a
top-level exported value, then the exported type uses the synonym.  But it's
easy to forget, and the only way you'll find out if you have is when a "user" of
the ADT finds curious type errors.

But what *really* kills this idea is the interaction with type classes.
We *must* be able to give instances for the ADT which differ from the 
instances for its representation type.  For example, we might want, say,

        instance Ord (Stk a) where
           s1 < s2 = length s1 < length s2

The trouble is that it isn't clear which overloadings to use when typechecking
the stack module:

        Stack (Stk, push, pop, foo)
        
        type Stk a = [a]

        foo :: Stk a -> Stk a -> Bool
        foo (x:xs) (y:ys) = xs < reverse ys

Now, should the "<" be done using the instance for Stk or for list?  Yikes!

Notice that everything is quite clear if we use the algebraic datatype approach:

        Stack (Stk, push, pop, foo)
        
        data Stk a = Stk a

        foo :: Stk a -> Stk a -> Bool
        foo (Stk (x:xs)) (Stk (y:ys)) = xs < reverse ys 
                                        -- < on lists

        foo (Stk (x:xs)) (Stk (y:ys)) = Stk xs < Stk (reverse ys)
                                        -- < on Stks


Addressing the efficiency issue
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since the efficiency problem arises from the (un-desired) addition of
an extra point to the domain, the right thing to do is simply to get rid of
it.  What that boils down to is making the constructor strict:

        data Stk a = Stk !a

where the "!" says "strict".  Or, maybe the syntax should be different

        abstype Stk a = Stk a

Now the implementation is free to implement the Stk constructor by nothing at
all; there is no pattern-matching or construction at all in the implementation.
The Stk constructors now simply serve to guide the type checker, which is just
what we want.

Conclusion
~~~~~~~~~~
There is little new in any of this.  The Haskell committee were well aware of
the difficulties with exporting type synonyms without their definitions, which
is why the language prohibits it.

What *is* new (to me at least) is the recognition that there's a fundamental
mis-match between type classes and ADTs a la Miranda (TM); and that the original
idea of abstraction via algebraic data type declarations is a Good One.

Incidentally, I believe we need to be careful about extending the idea of strict
constructors to all constructors (a la hbc), but that's another story.

Simon 

Reply via email to