With the sudden upsurge of interest in records, and Paul's recent call
for discussion on the future of Haskell, I thought that it might be
appropriate to post the following article describing some of the
alternatives for adding records to Haskell.  I originally wrote and
distributed this to a number of people back in 1993.  I don't think
that too much has changed since then, except for John Peterson's
proposals which are not addressed here.

-----------------------------------------------------------------------------
             Some Alternatives for Records in Haskell 2


A SURVEY OF THE OPTIONS:

It seems that there are (at least) three approaches to consider:

  1) No records at all.  Complete compatibility with current Haskell
     definition.  Already implemented in three Haskell systems.  The
     obvious choice for Haskell 1.3.

  2) The Standard ML approach to records.  The set of labels in any
     particular must be uniquely determined by context, possibly by
     an explicit type signature or by the use of a completely
     specified record expression.  No precedent for this in Haskell
     systems, but Standard ML systems do support this.

  3) Extensible records, probably tied in with the use of type
     classes.  An experimental version of Standard ML supporting
     a subset of this kind of system has been implemented, but
     there are many unanswered questions about how well such a
     system would behave in practice.

Each of these is completely compatible with the preceding options
so, if we decide to go for option (n) today, we can always go for
option (n+1) later without:

 (a) prohibiting any programs that were previously acceptable
     (syntax, type correctness etc.)

 (b) reducing the efficiency of programs that were written with
     an earlier system.


QUESTIONS OF SYNTAX:

While I don't like to bring up syntax this early, it seems to be
one of the most important problems that needs to be addressed.
Let's start by looking at some of the simplest operators that we
might want to support:

  o  Record construction.  Standard ML uses braces and lists of
     labeled items to write record expressions.  For example,
     {} denotes the empty record, {x=2, y=True} denotes a record
     with two fields, equal to {y=True, x=2}.  Although Haskell
     already uses braces for layout, I believe that it would be
     possible to add this syntax without conflict.  But it could
     be a little confusing.

     I'd suggest that we keep an analogy between records and
     labeled products.  The already familiar unit value ()
     is the empty record while other records could be written
     as tuples like (x=2, y=True) or (y=True, x=2).

     Type syntax can follow this: () being the type of the empty
     record, (x::Int, y::Bool) the type of the two records above.

  o  Field selection.  Several alternatives here.  Many languages
     use an infix dot for field selection for example r.x to get
     the x field of record x.  Haskell already uses infix dot
     for function composition, so we'd have some problems with
     this exact syntax.  Maybe we could find another character to
     denote field selection but (a) I doubt Haskell has left any
     characters for us to choose, and (b) if we did use another
     character, we'd loose the advantages of using a `standard'
     notation.

     Standard ML uses infix dot for structure selection, so they
     have the same problem.  Their solution is to write  #name r
     to select the name field of record r.  This is (IMHO) a little
     bit ugly, but keeps the namespaces for labels an regular
     functions quite distinct.  Note that, in Standard ML, a label
     selector #name cannot be assigned a type independently of the
     context in which it appears.

     Another alternative is to use the same namespace for functions
     and labels.  That way, name r gets us the name field of r.
     This looks prettier, but now we have the problem of how the
     system knows when a function (like `name' here) should be
     treated as a label selector.  I have yet to find a really
     satisfactory way to do this.

     With all the problems above, it's good to know there is a
     promising alternative ...
 
  o  Pattern matching.  As with most datatypes in Haskell, pattern
     matching is a natural way to get at the components of a record.
     For example:

        f           :: Num a => (l::a, m::a) -> a
        f (l=x, m=y) = x + y

     If we didn't want to write all of the fields of a record then
     we could use a syntax like:

        f        :: (l::Int, y::Bool) -> Int
        f (r|l=x) = x + 1

     Standard ML allows a form of ellipsis here in place of the
     `row variable' r ... but having a variable here will be useful
     if we wanted to support extensible records.

     Incidentally, according to the definition of Standard ML, the
     record selector function #name is treated as special syntax for
     a lambda abstraction of the form:  \{name,...} -> name.

     Problems with pattern matching:

       -  if we want to use a lot of differently named components
          from a record, then writing all those components and
          pattern matching is a *real* pain compared with selector
          functions.  Of course, the user can always write their
          own selector functions to reduce this problem ...

       -  Standard ML allows us to abbreviate record patterns like
          {x=x, y=y} to {x, y} ... and most Standard ML programs use
          the abbreviation more than the expanded form.  We can't do
          this with the syntax I've suggested because there is nothing
          to distinguish a record (x, y) from the tuple (x, y).  So
          maybe we should stick with braces instead of parentheses...

       -  With the system of extensible records, pattern matching can
          be used to define field selector functions such as:

              x {x=y,...} = y

          (This isn't possible in the Standard ML system.)  However,
          if programmers are going to do this a lot, we'd better make
          certain that functions like this can be inlined automatically
          (possibly in modules other than the one in which they are
          defined) or there could be a big performance hit.  Providing
          special syntax for record selectors in addition to the use of
          pattern matching would significantly reduce this problem.


OPTION (2) -- The Standard ML approach:

Standard ML requires the type of all records -- or, at least, the set
of all field labels -- to be uniquely determined by context.  So, for
example:

      f {x, ...} = x

will be rejected, but adding a type signature like:

      f :: (x::a, y::Bool, z::[(a,b)]) -> a

will allow the definition of f to pass.  Type declarations aren't the
only way this can happen.  For example:

     h (r@{x, ...}) = if x==0 then {x=12, y=True} else r

Since the fields of a record are uniquely determined at compile-time,
we can always figure out a layout for the record and access the various
components from fixed offsets from the start of the record.

This scheme seems to work well enough as far as it goes.  The main
drawbacks seem to be that it requires more explicit type annotations
than we might hope for, and it doesn't provide a very flexible form
of records.  On the other hand, all this discussion on records was
prompted by some comments on the Haskell mailing list ... if we wanted
to satisfy those needs, this would probably be the option to go for.


OPTION (3) -- Extensible records:

Here's an outline of a system extensible records that can give
the same performance as the Standard ML approach if the set of labels
in a record is uniquely determined by context.  I won't go into
much detail about how it can be implemented; it's basically just
a kind of type classes.

First we extend the language of types to allow record types of the
form (r | x::t, y::s), for example, representing records of type r
extended with two fields x and y of type t and s respectively.  We
will use type class constraints (or similar) to ensure that r does
not already include fields labeled x or y.  I'll write r\l for the
constraint expressing the fact that r doesn't have an l field.
(If you like, just think of `\l' as a type class, written using postfix
syntax rather than the normal prefix version.)

Terms follow this notation.  For example, the function to select
the x field from a record might be written as:

    f          :: r\x => (r | x::t) -> t
    f (r | x=x) = x

More generally, for each label l, we might want to define functions:

   (_.l)    :: r\l => (r|l::t) -> t              -- selector
   (_\l)    :: r\l => (r|l::t) -> r              -- restriction
   (_|l=_)  :: r\l => r -> t -> (r|l::t)         -- extension
   (_|l:=_) :: r\l => (r|l::t) -> s -> (r|l::s)  -- update

Actually, the class constraints here are superfluous because a type
(r | x::t) is only well formed if r\x holds.  So we could suppress
the constraints and write:

   f        :: (r | x::t) -> t
   (_.l)    :: (r|l::t) -> t
   (_\l)    :: (r|l::t) -> r
   (_|l=_)  :: r -> t -> (r|l::t)
   (_|l:=_) :: (r|l::t) -> s -> (r|l::s)

However, there are examples where constraints are necessary:

   g  :: r\x => r -> r
   g r = ((r|x=2)\x)

A little syntactic sugar would be useful here so we can write stuff
like:

    type Point = (x::Int, y::Int)

    move                  :: (r|Point) -> (Int,Int) -> (r|Point)
    move (r| x, y) (dx,dy) = (r| x:=x+dx, y:=y+dy)

As with type classes, each constraint requires a form of `dictionary'
parameter -- actually an integer giving the position of a field in a
record.  If the types of records are known then we can work out these
values at compile-time and access the components of a record directly.
On the other hand, if we have to pass a lot of parameters around at
run-time, then performance might suffer.  This is an example where
practical experience would help us to judge what the real performance
issues are likely to be.  (Incidentally, there is an extension of the
current monomorphism restriction that could be used to avoid nasty
performance surprises.  Details omitted for the time being.)


OTHER ISSUES:

There are a couple of additional points that should be considered
before we could add records to Haskell 1.3:

  o  What do we want to do about standard overloaded operations like
     equality, comparison, and the Text class stuff?  I imagine that
     people will want to compare and print records but, while it is
     reasonably easy to come up with a scheme for these operations,
     there are big problems in finding a nice approach that can be
     implemented cleanly and efficiently.

  o  Standard ML treats tuples as a special case of records; (2,True)
     is just an abbreviation for {1=2, 2=True}.  Maybe haskers
     would like something similar ... ?


-----------------------------------------------------------------------------

Records of one form or another would be an important step forward for
Haskell.  The responses that I have received to previous postings
describing record extensions indicated very strong support from people
who are actually using the language `in anger'.  (or should that be
`in comparitive bliss'? :-)

All the best,
Mark

Reply via email to