Does anyone know if this below situation is as bad in say SMLNJ or OCAML?
JanBrosius
----- Original Message -----
From: Jan Kort <[EMAIL PROTECTED]>
To: Simon Marlow <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, March 16, 2000 12:59 PM
Subject: Re: records in Haskell
> Simon Marlow wrote:
> >
> > Jan Kort writes:
> >
> > > It seem that any record, no matter how trivial, can't be much
> > > longer than about 200 lines in Haskell. If a try to compile a
> > > 300 line record containing just:
> > > data X = X {
> > > f1 :: String,
> > > f2 :: String,
> > > f3 :: String,
> > > ...
> > > f300 :: String
> > > }
> > > It needs about 90M heap in ghc4.06. Whereas a 150 line record
> > > requires less than 6M heap. After this big gap it levels off
> > > to a somewhat more decent exponential increase: a 450 line
> > > record requires about 180M heap.
> > >
> > > I could file a bug report, but it seems that all compilers
> > > (ghc4.06, nhc98, hbc0.9994 and hugs) have this problem. So,
> > > is this a fundamental problem ?
> >
> > Actually, the 150-line record needs about 20M, and the 300-line record
needs
> > about 75M. These figures are roughly double the actual residency,
because
> > GHC's underlying collector is a copying, not compacting, one.
> >
> > GHC automatically increases the heap size up to a maximum of 64M unless
you
> > tell it not to (with -optCrts-M32m, for example). I'll bet this is the
> > source of the confusion.
> >
> > The heap requirement is still non-linear, but I'm guessing that this is
> > because for each line you add to the record the compiler has to not only
> > generate a new selector function, but also add a field to the record
being
> > pattern matched against in all the existing selectors.
> >
> > Cheers,
> > Simon
>
>
> Thanks for the answers and sorry for the late reaction.
>
> I worked out an example to understand what you wrote.
> GHC will probably generate something like this:
>
> data R = R String Integer
> deriving (Read,Show)
>
> selectA (R s _) = s
> selectB (R _ i) = i
>
> updateA (R _ b) a = (R a b)
> updateB (R a _) b = (R a b)
>
> emptyR = R undefined undefined
>
> Which you can then use like this:
>
> updateR = updateB (updateA emptyR "a") 2
> testA = selectA updateR
> testB = selectB updateR
>
> I agree that the select and update pattern matchings would
> get big for a 300 line record, but 75M is a lot of memory.
> Especialy because the pattern matches and the right hand
> sides of both the selects and updates are trivial pieces
> of code: no nesting, no currying etc. But maybe GHC
> generates something extra ? Is special code generated
> for updating multiple fields for example ?
>
> I can probably work around this in a simple way: since I'm
> generating the big record, I might as well generate
> the selects, updates and emptyR instead and split them
> over a couple of files.
>
> Jan
>
>