Simon Marlow wrote:
>
> Jan Kort writes:
>
> > It seem that any record, no matter how trivial, can't be much
> > longer than about 200 lines in Haskell. If a try to compile a
> > 300 line record containing just:
> > data X = X {
> > f1 :: String,
> > f2 :: String,
> > f3 :: String,
> > ...
> > f300 :: String
> > }
> > It needs about 90M heap in ghc4.06. Whereas a 150 line record
> > requires less than 6M heap. After this big gap it levels off
> > to a somewhat more decent exponential increase: a 450 line
> > record requires about 180M heap.
> >
> > I could file a bug report, but it seems that all compilers
> > (ghc4.06, nhc98, hbc0.9994 and hugs) have this problem. So,
> > is this a fundamental problem ?
>
> Actually, the 150-line record needs about 20M, and the 300-line record needs
> about 75M. These figures are roughly double the actual residency, because
> GHC's underlying collector is a copying, not compacting, one.
>
> GHC automatically increases the heap size up to a maximum of 64M unless you
> tell it not to (with -optCrts-M32m, for example). I'll bet this is the
> source of the confusion.
>
> The heap requirement is still non-linear, but I'm guessing that this is
> because for each line you add to the record the compiler has to not only
> generate a new selector function, but also add a field to the record being
> pattern matched against in all the existing selectors.
>
> Cheers,
> Simon
Thanks for the answers and sorry for the late reaction.
I worked out an example to understand what you wrote.
GHC will probably generate something like this:
data R = R String Integer
deriving (Read,Show)
selectA (R s _) = s
selectB (R _ i) = i
updateA (R _ b) a = (R a b)
updateB (R a _) b = (R a b)
emptyR = R undefined undefined
Which you can then use like this:
updateR = updateB (updateA emptyR "a") 2
testA = selectA updateR
testB = selectB updateR
I agree that the select and update pattern matchings would
get big for a 300 line record, but 75M is a lot of memory.
Especialy because the pattern matches and the right hand
sides of both the selects and updates are trivial pieces
of code: no nesting, no currying etc. But maybe GHC
generates something extra ? Is special code generated
for updating multiple fields for example ?
I can probably work around this in a simple way: since I'm
generating the big record, I might as well generate
the selects, updates and emptyR instead and split them
over a couple of files.
Jan