Bayley, Alistair wrote:
Currently, I'm experiencing what I would call "strange behaviour":

I've got a data-type

data Fraction = Fraction Int Int

to hold rational numbers (maybe there's already some built-in type for this in Haskell,

http://haskell.org/ghc/docs/latest/html/libraries/base/Data-Ratio.html

Thanks for the pointer, I knew there would be something already there :)

This list has up to 3 million elements.  If I do

main = print $ length $ points

main = print $ length $ map (\(x, _) -> x == Fraction 1 2) points

main = print $ length $ reverse points

However, trying to do

import List
main = print $ length $ sort points

makes memory usage go up and the program does not finish in 15m, also spending most time waiting for swapped out memory. What am I doing wrong, why is sort this expensive in this case? I would assume that computing and holding the whole list does not take too much memory, given its size and data type; doing the very same calculation in C should be straight forward. And sort should be O(n * log n) for time and also not much more expensive in memory, right?

Not having looked at your code, I think you are benefiting from
fusion/deforestation in the first three main functions. In this case,
although you may appear to be evaluating the entire list, in fact the
list elements can be discarded as they are generated, so functions like
length and reverse can run using constant space, rather than O(n) space.

How does reverse work in constant space? At the moment I can't imagine it doing so; that's why I tried it, but of course you could be right.

The sort function, however, requires that the entire list is retained,
hence greater memory usage. I also think you are optimistic in the
memory requirements of your 3 million element list. A list of Ints will
take a lot more than 4 bytes per element (on 32-bit machines) because
there's overhead for the list pointers, plus possibly the boxes for the
Ints themselves. I think there are 3 machine words for each list entry
(pointer to this element, pointer to next element, info-table pointer),
and 2 words for each Int, but I'm just guessing:
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObje
cts

Of course that's the case, but the list being 3 million elements, and not, say 100 (which would still fit into memory for a simple C array of ints) I thought would make it possible. Otherwise, how can one handle such amounts in data anyway? Only using arrays?

You might get some mileage by suggesting to GHC that your Fraction type
is strict e.g.

data Fraction = Fraction !Int !Int

which might persuade it to unbox the Ints, giving some space savings.

I already tried so, but this doesn't change anything to the performance. I will however try now to use the provided rational type, maybe this helps.

Thanks for the answers,
Daniel

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to