Haskell 1.3 Prelude changes

"John C. Peterson" Fri, 8 Sep 1995 23:01:27 -0400

Changes to the Haskell 1.3 Prelude

The following changes have been proposed (or accepted) for Haskell 1.3.


* Reorganize the Ord class                  
* Add succ and diff to Enum                 
* Add new class "Bounded"                   
* Add strictness annotation to Complex and Ratio
* Use Int in take, drop and splitAt         
* Add replicate, lookup, curry and uncurry  
* Move functions into libraries             
* Non-overloaded versions of PreludeList functions 
* Numeric Issues                            
* Simplify lex                              
* Add undefined                             
* Monad Class                               

 
 Changes to Ord

In Haskell 1.2, two comparisons are required to do a "three way branch":

    if x == y then ...
    else if x < y then ...
    else ...

Even a standard two way branch can be inefficient - here's the 
default definition of "<" in the standard prelude:

x < y = x <= y && x /= y

Instead of defining a <= operator which returns just two values, it
is almost as easy to define an operator which returns three different
values:

   case compare x y of
    EQ -> ...
    LT -> ...
    GT -> ...

The constructors  EQ , LT ,and GT belong to
a new type: Ordering.
In addition to this efficiency problem, many uses of Ord such as 
sorting or operations on ordered binary trees assume total ordering.
The compare operation formalizes this concept: it can not
return a value which indicates that its arguments are unordered.
Programmers are free to define a class for partial orderings; here, we
simply state that Ord is reserved for total orderings.

Proposed Changes:

 * Add a new type:

data Ordering = LT | EQ | GT  deriving (Eq,Ord,Ix,Enum,Bounded,Text)

 * Delete comment in definition of class Ord which explains how
   to define min and max for both total and partial orders.
 * Change definition of Ord to

class Ord a where
   compare :: a -> a -> Ordering
   (<), (<=), (>=), (>):: a -> a -> Bool
   max, min            :: a -> a -> a
   -- circular default definition:
   -- either <= or compare must be explicitly provided
     x < y              = compare x y == LT
     x <= y          = compare x y /= GT
     x > y              = compare x y == GT
     x >= y             = compare x y /= LT
     compare x y 
       | x == y    = EQ
       | x <= y    = LT
       | otherwise = GT
     max x y = case compare x y of
                LT -> x
                _  -> y
     min x y = case compare x y of
                LT -> y
                _  -> x

 * Change definitions of Ord instances in PreludeCore.  At present,
   Ord instances define the "<=" method.  These should be deleted and
   replaced by definitions of the "compare" method. 
 * Add this sentence to Appendix E:

   "The operator compare is defined so at to compare its arguments
    lexicographically (with earlier constructors in the datatype 
    declaration counting as smaller than later ones) returning
    LT, EQ and GT (respectively) as the first argument is strictly
    less than, equal to and strictly greater than the second argument
    (respectively)."


   The methods >, >=, <, <= could be removed from Ord and turned into 
   ordinary overloaded functions.  For efficiency, these could be
   specialized; the GHC specialize pragma allows an explicit definition
   of a function at a particular overloading:

  Specialize (<=) :: Int -> Int -> Bool = primLeInt





Add succ and diff to Enum



Haskell 1.2 provides very limited facilities for operating on
enumerations.  The following elementary operations must be implemented
in an obscure and inefficient manner, if at all:

 * Get the next value in enumeration: (\ x -> head [x..])
 * Get the previous value in enumeration: no reasonable way
 * Get the n'th value in enumeration: [C0..] !! (n - 1)
     (where C0 is first in enumeration)
 * Find where a value occurs in an enumeration: lookup (zip [C0..] [0..]) x 

Proposed changes:

 * Add two new methods to Enum:

   succ :: Int -> a -> a
   diff :: a -> a -> Int

Informally, given an enumeration:

 data T = C0 | C1 | ... Cm

we have:

 diff Ci Cj = i - j
 succ x Ci | 0 <= i+x && i+x 

For example, given the datatype and function:

 data Colour = Red | Orange | Yellow | Green | Blue | Indigo | Violet

 toColour :: Int -> Colour
 toColour i = succ i Red

we would have:

 toColour 0 = Red
 toColour 1 = Orange
 ...
 toColour 6 = Violet


 * Change definitions of Enum instances:

 instance Enum Char where
   succ = primCharSucc
   diff = primCharDiff
   enumFrom     = boundedEnumFrom maxChar
   enumFromThen = boundedEnumFromThen minChar maxChar
  
 boundedEnumFrom hi x | x 

 * Change description of derived instances of Ix for enumerations and Enum:

   Given the enumeration:

 data c => T u1 ... uk = K1 | ... | Kn deriving (C1,...Cm)

succ i Cj returns C(i+j) if 0  Enum a where
    enumFrom = boundedEnumFrom Kn
    enumFromThen = boundedEnumFromThen K1 Kn

and the derived Ix instance is defined by:

  class (Ord a) => Ix a where
    range (l,u) = [l..u]
    index (l,u) i | inrange (l,u) i = diff i u
    inRange (l,u) i = l 


Notes:

 * The types of succ and diff could be changed to:

   succ :: Integer -> a -> a
   diff :: a -> a -> Integer

   The only purpose in doing so would be to allow correct operation
   across the full range of integers.

 *The definitions of boundedEnumFrom[Then] are carefully chosen to
   avoid overflow.  In particular, to avoid: 

     [True..] = True : error "succ{PreludeCore}: result out of range"
     [maxChar..] = maxChar : error "succ{PreludeCore}: result out of range"

   This problem was present in Haskell 1.2 for the types Int, Float and
   Double and could be avoided for Int using boundedEnumFrom instead
   of numericEnumFrom and for Float and Double using similarly defined
   functions.

 * The functionality of diff is almost provided by index.
   The test for an upper bound is distracting.
 * There are overflow and rounding problems associated with Integer
   and floating point numbers.  These do not affect the 
   old Enum methods, which can be implemented as before, but
   may lead to overflow.


The Bounded class

Many built-in types and user-defined types in class Ord have a
natural minimum and maximum value.  Haskell 1.2 uses several
ad-hoc methods for accessing these values.  For example, the
names minInt, maxInt and minChar,
maxChar are defined in the prelude and implicitly used in
appendix E.  (Our proposed changes involving succ and
diff would make those uses explicit.)

It would be much more regular to define a new class Bounded
whose instances have a minimum and maximum value.  This also
allows the functions ord and chr (currently restricted to Chars) to
be generalized to work for any bounded enumeration type.

Proposed changes:

 * Add class Bounded:

 class (Ord a) => Bounded a where
   minBound, maxBound :: a
   -- forall x::a. minBound 

 * Add instances for all appropriate datatypes:

 instance Bounded Char where
   minBound = '\0'
   maxBound = '\255'

 instance Bounded Int where
   minBound = primMinInt
   maxBound = primMaxInt

 instance Bounded Float where
   minBound = primMaxFloat -- new primitive
   maxBound = primMinFloat -- new primitive (negation of primMaxFloat?)

 instance Bounded Double where
   minBound = primMaxFloat -- new primitive
   maxBound = primMinFloat -- new primitive (negation of primMaxFloat?)

 instance Bounded a => Bounded (Maybe a) where
   minBound = Nothing
   maxBound = Just maxBound


 * Add deriving clauses or (), tuples, and  Bool.

 * Changes to appendix E:

 Instances may be derived for any enumeration.  For a type of the form

   data T = C1 | C2 | ... | Cm

 the derived instance is (generated by `deriving Bounded')

   instance Bounded T where
     minBound = C1
     maxBound = Cm

 Instances may also be derived for single-constructor datatypes
 (in particular, tuples).  For a type of the form:

     data c => T u1 ... uk = C t1 ... tm

 where each of t1 ... tm are instances of Bounded, the derived
 instances are

     instance c => Bounded (T u1 ... uk) where
       minBound = C minBound ... minBound
       maxBound = C maxBound ... maxBound


 * Replace the functions ord and chr with the functions:

 fromEnum :: (Bounded a, Enum a) => a -> Int
 fromEnum x = diff x (minBound x)

 toEnum :: (Bounded a, Enum a) => Int -> a
 toEnum x = succ (minBound x) x



Notes:

Adding a new standard class is a larger change than simply adding a 
new method to an existing class - so it's natural to wonder why.
The primary reason is that the (unbounded) type Integer is
an instance of both Ix and Enum but not of
Bounded. 

One possible solution is to define minBound and
maxBound to be    (runtime) errors for type
Integer and add 
the methods minBound and maxBound to
Enum instead.  (They should be renamed minEnum and 
maxEnum if this is done.)  This has the problem that it defers 
detection of a (rare?) type error to runtime.

 Another possible solution is to remove Integer from
Enum.  (This would force Integer to be removed
>from Ix as well).     This is quite a radical change but
probably wouldn't break any code. 

It isn't possible to provide Bounded instances for
Float and Double using Haskell 1.2 primitives.
Easily fixed. 

The names fromEnum and toEnum are misleading since
their types involve both Enum and Bounded.  We couldn't face writing 
fromBoundedEnum and toBoundedEnum.  Suggestions
welcome. 


 Strictness annotations in Ratio and Complex 

Augustsson's FPCA'94 paper demonstrates very significantly
lower memory requirements when these annotations are added to
Complex.  Adding similar annotations to Ratio
should have a similar effect.

Whereas changing the strictness of lists or integers would break
many programs, only the most contrived programs will be affected by
this change.

Proposed changes:

Change definitions of Complex and Ratio to:

 data (RealFloat a) => Complex a = !a :+ !a deriving (Eq, Binary, Text)
 data (Integral a)  => Ratio a   = !a :% !a deriving (Eq, Binary)


 Using Int in list operations 

The Haskell 1.2 prelude forces the use of Integer rather
than the more efficient Int for the very common operations
take, drop and 
splitAt.  There are only two circumstances where the
overhead of using Integer's is worthwhile:

 * To pass conformance tests such as:

   head (take (toInteger maxInt * 2) [1..]) == 1

 * When dealing with very large datasets (more than 2^29 elements)

Users of these very large datasets are probably already using
implementations where maxInt == 2^31-1 or maxInt == 2^63-1
where the problem is less likely to occur.  
The old definitions can be placed in a library should problems arise.

Proposed changes:

Change the types of take, drop and splitAt to:

 take, drop :: Int -> [b] -> [b]
 splitAt :: Int -> [b] -> ([b]. [b])

Add genericTake, genericDrop and
genericSplitAt (with old signatures and definitions) to a
library. 

New functions: replicate, lookup, curry and uncurry

The following functions are deemed sufficiently useful that they
should be added to the Haskell Prelude.

Proposed Changes:

Add replicate and lookup to PreludeList:

   replicate :: Int -> a -> [a]
   replicate n x = take n . repeat x

   lookup :: Eq a => a -> [(a,b)] -> Maybe b
   lookup key [] = Nothing
   lookup key ((x,y):xys) 
     | key == x  = Just y
     | otherwise = lookup key xys

Add curry and uncurry to Prelude:

  curry :: ((a, b) -> c) -> a -> b -> c
  curry f x y = f (x,y)

  uncurry :: (a -> b -> c) -> (a, b) -> c
  uncurry f (x, y) = f x y

Notes:

Adding functions in the Prelude rather than a library forces the
automatic import of the name into every Haskell program.   
This may break existing code which already uses these names or
 prevent a programmer from using these names for their own
functions.  (This is why we chose the name replicate instead
of copy.)  It also expands the "vocabulary" of Haskell -
adding an extra    name which Haskell programmers can be expected to know.

In previous discussion, the name assoc was preferred to
lookup to avoid stealing a useful name from the user.
The library proposal also defines LibFiniteMap.lookup and
LibHashTable.lookup - we feel that stealing the one name for
three purposes is reasonable.

An alternative definition for uncurry is:

      uncurry f ~(x, y) = f x y

This ensures that:

  curry (uncurry f) = f
  uncurry (curry f) = f

Is this important?


Moving functions to libraries

A large prelude helps the programmer by providing a wide range of
pre-written functions to use and combine.  It also hinders the
programmer by making the language larger to learn, preventing the
programmer from using prelude names for their own purposes and
slowing down compilation.

It is possible to have our cake and eat it by simply moving rarely
used functions into standard libraries (the language size remains
the same but the "core language" gets smaller).

The following changes are based on counting how many times certain
functions are used in a large body of Haskell programs (i.e. the
Glasgow Haskell repository).

Proposed changes:

The following functions will be moved into libraries.  The exact
library names have not yet been chosen.

   isAscii, isControl, isPrint, isSpace, isUpper, 
   isLower, isAlpha, isDigit, toUpper, toLower, gcd,
   lcm, ^, ^^, scanl, scanl1, scanr, scanr1, sums, products, zip[3-7],
   unzip[3-7], transpose.

The functionality in class Binary is being rethought.
All support for binary I/O will be in a library.

A number of prelude modules will be moved to libraries in their
entirety: PreludeArray, PreludeComplex, and
PreludeRatio.


Non-overloaded entries to overloaded functions

While the choice of an Eq instance for types such as
Int and Bool are obvious, there may be more than
one way to perform these operations on more complex datatypes.  For
example, string comparison may be either case sensitive or case
insensitive.  A type can have only one style of comparison defined in
instances for Eq or Ord.  

This would normally present no problems, since the user can use some
other function instead of == or < if needed.
However, many prelude functions use == directly.  We propose
to supply an alternative version of every prelude function which uses
a specific class method such as ==.  These will pass the
function as a parameter instead of using one defined in an associated
instance.  We use the suffix "By" to name these alternative functions.

Proposed changes:

Add generalised versions of the Eq/Ord-overloaded functions in PreludeList:

   nubBy, elemBy, notElemBy, minimumBy, maximumBy, 
   deleteFirstsBy, deleteBy, lookupBy

For example:

 nubBy :: (a -> a -> Bool) -> [a] -> [a]
 nubBy eq [] = []
 nubBy eq (x:xs) = x : nubBy (filter (not . eq x) xs)

The original functions will be changed to use new non-overloaded
functions.  For example:

      nub :: Eq a => [a] -> [a]
      nub = nubBy (==)



Notes:

The name deleteFirstsBy (as a pronunciation for \\) is
not ideal.  We welcome any suggestions.
These functions will not be used as heavily as the existing functions
and so could be placed in a separate library (LibList).
However, we need a consistent rule about where to put the
"fooBy" functions --- and "in the same module" is the simplest.
      
Though we have found most of these functions useful in our own
programming, we haven't used all of them.  We considered it
more important to be consistent (by adding all of them) than
to try to guess which were the most useful.


 Numeric Issues 

Haskell 1.2 provides no way to detect IEEE arithmetic values
such as "NaN" or "Infinity".
This seemed reasonable since Haskell 1.2 (and 1.3) doesn't
require IEEE arithmetic.  However, there are no Haskell
implementations on non-IEEE hardware (we have heard rumors
of a Cray implementation "soon") and a few programmers do want
to be able to perform "serious" numerical work in Haskell.

Providing everything numerical analysts could ask for requires
a large amount of work but a few simple additions would make
it possible to perform a few basic operations.

Proposed Changes:

Add methods isNaN, isInfinity, isDenormalized
and isIEEE to RealFloat:

  isNaN, isInfinity, isDenormalized, isIEEE :: RealFloat a => a -> Bool

In non-IEEE implementations, these all return False.
In IEEE implementations, isIEEE returns True and all others
return appropriate responses.

Add line to definition of showFloat:

  showFloat x =
    if isNaN x then showString "<NaN>" 
    else if x == 0 then ... -- remainder of function is unchanged



Notes:

These changes do not preclude a library for full IEEE compatibility.
Such a library is under development.

There is also a need for versions of showFloat which provide
control over the format used and the number of digits displayed
(like the E,F and G format controls in C's printf function).
These will be provided in a library.


 Simplified Lex 


The lex function is very inefficient and overly complex.
Some of this complexity is necessary to ensure that almost any value that
is printed out using standard Text instance can be read
back in.  (Exceptions: functions and infinite data structures.)
However, lex is able to parse several tokens that are never produced
by these Text instances.

Proposed Changes:

Remove support in lex for:

 * Both forms of comment (change to lex)
 * <- (change to lex)
 * "^<ctrl-char>" tokens (change to lexLitChar)
 * octal tokens          (change to lexLitChar)
 * hex tokens            (change to lexLitChar)
 * ASCII abbreviations for control characters (change to lexLitChar)

Move isOctDigit and isHexDigit to the same module
as functions like isAscii and isControl.


Notes:

This won't make lex or read any more efficient.
The problems are: 

 * lex is built using the same fully backtracking
   technology as read (it returns a list of alternative parses
   instead of a single token) but the resulting flexibility doesn't 
   seem to be used anywhere.  This imposes a significant overhead with little
   reward.
 * lex produces strings rather than tokens.  This means
   that every token has to be identified twice: once in lex where the
   input list is split into strings representing tokens and once in
   read where these strings are matched against required input.
   The most effective way of fixing this is to introduce a new datatype
   representing each token. 
 * Derived instances of readsPrec (which is called by
   read) usually have the following form (see Appendix E):

   readsPrec d r = 
      readCon K1 k1 "K1" r ++
      readCon K2 k2 "K2" r ++
          ...
      readCon Km km "Km" r

   In order to successfully parse a string starting with "Km", 
   readsPrec will call the lex function m times.  If 
   lex were separated from read by preprocessing the input
   into a token stream, lex would only have to be called once.
 * When parsing a type with infix constructors, the derived
   instance of readsPrec may relax and reparse the same
   expression many times.  


 * Though a good start, this doesn't make lex much easier to
   understand.  A more effective specification would be a
   grammar or regular expression.  This would also simplify the task of
   feeding the specification into a Haskell lexer/parser generator or
   some similar (more appropriate) implementation technology.

 * Haskell 1.3 will support hex (and octal) specification of numbers.
  It isn't clear whether lex should be modified to support this.


 Undefined 

Add "undefined" to the prelude.  Previous proposals used _ or _|_
as the name.

undefined = error "undefined{Prelude}"

It is expected that compilers will recognize this and insert error messages
which are more appropriate to the context in which undefined
appears.

  
 The Monad class

The Haskell 1.3 IO proposal uses the symbols ">>", ">>=" and
"return" for the IO monad operations and a separate proposal
provides syntactic sugar to make monadic code even easier to
write.  The provision of constructor classes makes it possible
to define these symbols as methods in a "monad class" 
allowing the symbols and sugar to be used for any monad.

Proposed Changes:

Add this class to PreludeCore:

  class Monad m where
    (>>=) :: m a -> (a -> m b) -> m b
    return :: a -> m a

with instances for [], Maybe and IO.

Notes:

Most papers on monads also discuss "monads with zeroes" (examples
are "[]" in the list monad, "Nothing" in the Maybe monad and 
parsing failure in the parsing monad).  We have not yet decided whether to place this 
in the prelude or a standard library.

A standard library will provide the usual range of monadic operations
(maps, folds, filters, guards, etc).


 Minor Changes 

Improve error messages for maximum and minimum:

  maximum [] =  error "maximum{PreludeList}: empty list" 
  maximum l  =  foldl1 max l
  minimum [] =  error "minimum{PreludeList}: empty list" 
  minimum l  =  foldl1 min l

The error messages for head, tail, init and
last use "[]" in their error messages; this will be changes
to "empty list" for consistency.

Add these equations to end of definitions of take, drop and splitAt.

  take _ _    = error "take{PreludeList}: negative argument"
  drop _ _    = error "drop{PreludeList}: negative argument"
  splitAt _ _ = error "splitAt{PreludeList}: negative argument"

Add backquote to list of symbols in definition of isSingle 
in definition of lex.

Remove even/odd from Integral because
we can't imagine why they're not just normal overloaded functions.

Changes made possible by addition of records:
  Use records to define the Haskell 1.3 IO types CalenderTime
  and TimeDiff. 
The zipWith family of names is being renamed to
map2, map3,...  The old names will be left in for
compatibility.

  
 Unfinished Business 


These issues are still being debated.

Naming: are the following names acceptable:  succ, diff, fromEnum
(maybe ord?), and toEnum.

Should there be more IEEE support for standard floats?

Are there more monad functions that ought to be in the prelude?

The Monad0 class probably should be in the prelude.  Useful
functions for the IO monad would be too.  Perhaps:

done = return ()
when p s = if s then p else done
unless p s = when (not p) s
guard p = if p then failWith "Guard failure" else done
Haskell 1.3 Prelude changes

Reply via email to