Bayley, Alistair wrote:
Currently, I'm experiencing what I would call "strange behaviour":
I've got a data-type
data Fraction = Fraction Int Int
to hold rational numbers (maybe there's already some built-in
type for this in Haskell,
http://haskell.org/ghc/docs/latest/html/libraries/base/Data-Ratio.html
Thanks for the pointer, I knew there would be something already there :)
This list has up to 3 million elements. If I do
main = print $ length $ points
main = print $ length $ map (\(x, _) -> x == Fraction 1 2) points
main = print $ length $ reverse points
However, trying to do
import List
main = print $ length $ sort points
makes memory usage go up and the program does not finish in 15m, also
spending most time waiting for swapped out memory. What am I doing
wrong, why is sort this expensive in this case? I would assume that
computing and holding the whole list does not take too much memory,
given its size and data type; doing the very same calculation in C
should be straight forward. And sort should be O(n * log n) for time
and also not much more expensive in memory, right?
Not having looked at your code, I think you are benefiting from
fusion/deforestation in the first three main functions. In this case,
although you may appear to be evaluating the entire list, in fact the
list elements can be discarded as they are generated, so functions like
length and reverse can run using constant space, rather than O(n) space.
How does reverse work in constant space? At the moment I can't imagine
it doing so; that's why I tried it, but of course you could be right.
The sort function, however, requires that the entire list is retained,
hence greater memory usage. I also think you are optimistic in the
memory requirements of your 3 million element list. A list of Ints will
take a lot more than 4 bytes per element (on 32-bit machines) because
there's overhead for the list pointers, plus possibly the boxes for the
Ints themselves. I think there are 3 machine words for each list entry
(pointer to this element, pointer to next element, info-table pointer),
and 2 words for each Int, but I'm just guessing:
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObje
cts
Of course that's the case, but the list being 3 million elements, and
not, say 100 (which would still fit into memory for a simple C array of
ints) I thought would make it possible. Otherwise, how can one handle
such amounts in data anyway? Only using arrays?
You might get some mileage by suggesting to GHC that your Fraction type
is strict e.g.
data Fraction = Fraction !Int !Int
which might persuade it to unbox the Ints, giving some space savings.
I already tried so, but this doesn't change anything to the performance.
I will however try now to use the provided rational type, maybe this
helps.
Thanks for the answers,
Daniel
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe