Simon Marlow wrote:
> 
> Jan Kort writes:
> 
> > It seem that any record, no matter how trivial, can't be much
> > longer than about 200 lines in Haskell. If a try to compile a
> > 300 line record containing just:
> > data X = X {
> >         f1 :: String,
> >         f2 :: String,
> >         f3 :: String,
> >       ...
> >       f300 :: String
> > }
> > It needs about 90M heap in ghc4.06. Whereas a 150 line record
> > requires less than 6M heap. After this big gap it levels off
> > to a somewhat more decent exponential increase: a 450 line
> > record requires about 180M heap.
> >
> > I could file a bug report, but it seems that all compilers
> > (ghc4.06, nhc98, hbc0.9994 and hugs) have this problem. So,
> > is this a fundamental problem ?
> 
> Actually, the 150-line record needs about 20M, and the 300-line record needs
> about 75M.  These figures are roughly double the actual residency, because
> GHC's underlying collector is a copying, not compacting, one.
> 
> GHC automatically increases the heap size up to a maximum of 64M unless you
> tell it not to (with -optCrts-M32m, for example).  I'll bet this is the
> source of the confusion.
> 
> The heap requirement is still non-linear, but I'm guessing that this is
> because for each line you add to the record the compiler has to not only
> generate a new selector function, but also add a field to the record being
> pattern matched against in all the existing selectors.
> 
> Cheers,
>         Simon


Thanks for the answers and sorry for the late reaction.

I worked out an example to understand what you wrote.
GHC will probably generate something like this:

        data R = R String Integer
                 deriving (Read,Show)

        selectA (R s _) = s
        selectB (R _ i) = i

        updateA (R _ b) a = (R a b)
        updateB (R a _) b = (R a b)

        emptyR  = R undefined undefined

Which you can then use like this:

        updateR = updateB (updateA emptyR "a") 2
        testA   = selectA updateR
        testB   = selectB updateR

I agree that the select and update pattern matchings would
get big for a 300 line record, but 75M is a lot of memory.
Especialy because the pattern matches and the right hand
sides of both the selects and updates are trivial pieces
of code: no nesting, no currying etc. But maybe GHC
generates something extra ? Is special code generated
for updating multiple fields for example ?

I can probably work around this in a simple way: since I'm
generating the big record, I might as well generate
the selects, updates and emptyR instead and split them
over a couple of files.

  Jan

Reply via email to