With the sudden upsurge of interest in records, and Paul's recent call for discussion on the future of Haskell, I thought that it might be appropriate to post the following article describing some of the alternatives for adding records to Haskell. I originally wrote and distributed this to a number of people back in 1993. I don't think that too much has changed since then, except for John Peterson's proposals which are not addressed here. ----------------------------------------------------------------------------- Some Alternatives for Records in Haskell 2 A SURVEY OF THE OPTIONS: It seems that there are (at least) three approaches to consider: 1) No records at all. Complete compatibility with current Haskell definition. Already implemented in three Haskell systems. The obvious choice for Haskell 1.3. 2) The Standard ML approach to records. The set of labels in any particular must be uniquely determined by context, possibly by an explicit type signature or by the use of a completely specified record expression. No precedent for this in Haskell systems, but Standard ML systems do support this. 3) Extensible records, probably tied in with the use of type classes. An experimental version of Standard ML supporting a subset of this kind of system has been implemented, but there are many unanswered questions about how well such a system would behave in practice. Each of these is completely compatible with the preceding options so, if we decide to go for option (n) today, we can always go for option (n+1) later without: (a) prohibiting any programs that were previously acceptable (syntax, type correctness etc.) (b) reducing the efficiency of programs that were written with an earlier system. QUESTIONS OF SYNTAX: While I don't like to bring up syntax this early, it seems to be one of the most important problems that needs to be addressed. Let's start by looking at some of the simplest operators that we might want to support: o Record construction. Standard ML uses braces and lists of labeled items to write record expressions. For example, {} denotes the empty record, {x=2, y=True} denotes a record with two fields, equal to {y=True, x=2}. Although Haskell already uses braces for layout, I believe that it would be possible to add this syntax without conflict. But it could be a little confusing. I'd suggest that we keep an analogy between records and labeled products. The already familiar unit value () is the empty record while other records could be written as tuples like (x=2, y=True) or (y=True, x=2). Type syntax can follow this: () being the type of the empty record, (x::Int, y::Bool) the type of the two records above. o Field selection. Several alternatives here. Many languages use an infix dot for field selection for example r.x to get the x field of record x. Haskell already uses infix dot for function composition, so we'd have some problems with this exact syntax. Maybe we could find another character to denote field selection but (a) I doubt Haskell has left any characters for us to choose, and (b) if we did use another character, we'd loose the advantages of using a `standard' notation. Standard ML uses infix dot for structure selection, so they have the same problem. Their solution is to write #name r to select the name field of record r. This is (IMHO) a little bit ugly, but keeps the namespaces for labels an regular functions quite distinct. Note that, in Standard ML, a label selector #name cannot be assigned a type independently of the context in which it appears. Another alternative is to use the same namespace for functions and labels. That way, name r gets us the name field of r. This looks prettier, but now we have the problem of how the system knows when a function (like `name' here) should be treated as a label selector. I have yet to find a really satisfactory way to do this. With all the problems above, it's good to know there is a promising alternative ... o Pattern matching. As with most datatypes in Haskell, pattern matching is a natural way to get at the components of a record. For example: f :: Num a => (l::a, m::a) -> a f (l=x, m=y) = x + y If we didn't want to write all of the fields of a record then we could use a syntax like: f :: (l::Int, y::Bool) -> Int f (r|l=x) = x + 1 Standard ML allows a form of ellipsis here in place of the `row variable' r ... but having a variable here will be useful if we wanted to support extensible records. Incidentally, according to the definition of Standard ML, the record selector function #name is treated as special syntax for a lambda abstraction of the form: \{name,...} -> name. Problems with pattern matching: - if we want to use a lot of differently named components from a record, then writing all those components and pattern matching is a *real* pain compared with selector functions. Of course, the user can always write their own selector functions to reduce this problem ... - Standard ML allows us to abbreviate record patterns like {x=x, y=y} to {x, y} ... and most Standard ML programs use the abbreviation more than the expanded form. We can't do this with the syntax I've suggested because there is nothing to distinguish a record (x, y) from the tuple (x, y). So maybe we should stick with braces instead of parentheses... - With the system of extensible records, pattern matching can be used to define field selector functions such as: x {x=y,...} = y (This isn't possible in the Standard ML system.) However, if programmers are going to do this a lot, we'd better make certain that functions like this can be inlined automatically (possibly in modules other than the one in which they are defined) or there could be a big performance hit. Providing special syntax for record selectors in addition to the use of pattern matching would significantly reduce this problem. OPTION (2) -- The Standard ML approach: Standard ML requires the type of all records -- or, at least, the set of all field labels -- to be uniquely determined by context. So, for example: f {x, ...} = x will be rejected, but adding a type signature like: f :: (x::a, y::Bool, z::[(a,b)]) -> a will allow the definition of f to pass. Type declarations aren't the only way this can happen. For example: h (r@{x, ...}) = if x==0 then {x=12, y=True} else r Since the fields of a record are uniquely determined at compile-time, we can always figure out a layout for the record and access the various components from fixed offsets from the start of the record. This scheme seems to work well enough as far as it goes. The main drawbacks seem to be that it requires more explicit type annotations than we might hope for, and it doesn't provide a very flexible form of records. On the other hand, all this discussion on records was prompted by some comments on the Haskell mailing list ... if we wanted to satisfy those needs, this would probably be the option to go for. OPTION (3) -- Extensible records: Here's an outline of a system extensible records that can give the same performance as the Standard ML approach if the set of labels in a record is uniquely determined by context. I won't go into much detail about how it can be implemented; it's basically just a kind of type classes. First we extend the language of types to allow record types of the form (r | x::t, y::s), for example, representing records of type r extended with two fields x and y of type t and s respectively. We will use type class constraints (or similar) to ensure that r does not already include fields labeled x or y. I'll write r\l for the constraint expressing the fact that r doesn't have an l field. (If you like, just think of `\l' as a type class, written using postfix syntax rather than the normal prefix version.) Terms follow this notation. For example, the function to select the x field from a record might be written as: f :: r\x => (r | x::t) -> t f (r | x=x) = x More generally, for each label l, we might want to define functions: (_.l) :: r\l => (r|l::t) -> t -- selector (_\l) :: r\l => (r|l::t) -> r -- restriction (_|l=_) :: r\l => r -> t -> (r|l::t) -- extension (_|l:=_) :: r\l => (r|l::t) -> s -> (r|l::s) -- update Actually, the class constraints here are superfluous because a type (r | x::t) is only well formed if r\x holds. So we could suppress the constraints and write: f :: (r | x::t) -> t (_.l) :: (r|l::t) -> t (_\l) :: (r|l::t) -> r (_|l=_) :: r -> t -> (r|l::t) (_|l:=_) :: (r|l::t) -> s -> (r|l::s) However, there are examples where constraints are necessary: g :: r\x => r -> r g r = ((r|x=2)\x) A little syntactic sugar would be useful here so we can write stuff like: type Point = (x::Int, y::Int) move :: (r|Point) -> (Int,Int) -> (r|Point) move (r| x, y) (dx,dy) = (r| x:=x+dx, y:=y+dy) As with type classes, each constraint requires a form of `dictionary' parameter -- actually an integer giving the position of a field in a record. If the types of records are known then we can work out these values at compile-time and access the components of a record directly. On the other hand, if we have to pass a lot of parameters around at run-time, then performance might suffer. This is an example where practical experience would help us to judge what the real performance issues are likely to be. (Incidentally, there is an extension of the current monomorphism restriction that could be used to avoid nasty performance surprises. Details omitted for the time being.) OTHER ISSUES: There are a couple of additional points that should be considered before we could add records to Haskell 1.3: o What do we want to do about standard overloaded operations like equality, comparison, and the Text class stuff? I imagine that people will want to compare and print records but, while it is reasonably easy to come up with a scheme for these operations, there are big problems in finding a nice approach that can be implemented cleanly and efficiently. o Standard ML treats tuples as a special case of records; (2,True) is just an abbreviation for {1=2, 2=True}. Maybe haskers would like something similar ... ? ----------------------------------------------------------------------------- Records of one form or another would be an important step forward for Haskell. The responses that I have received to previous postings describing record extensions indicated very strong support from people who are actually using the language `in anger'. (or should that be `in comparitive bliss'? :-) All the best, Mark