Claus Reinke ha scritto:
appendU is strict, insertWith just doesn't force it (follow the source link
in the haddocks to see why).

Ok, I see.
But, IMHO, this should be clearly documented.

There seems to be some agreement that strict variant operations should
also be provided, but it needs some more thinking, and whenever I look
at the code, I have the feeling that there are just too many operations
defined in it, so I look away again:-(


Well, it does not matter if strict variant operation is provided or not.
But at least it should be documented whether an implemented operation is strict or lazy.


However, reading the code now, I prefer my version using alter.

Yes, it is a more direct way to enforce strict evaluation. Btw, if your
update function 'f' is strict in 'old' and 'new', you might just want to use 'Just $! f old new' and save all those 'seq's.


Righ, thanks.
And this seems to be a bit more memory friendly, too.

By the way, about insertWith/alter; from IntMap documentation:

insertWithKey: O(min(n,W)
alter: O(log n)

So, alter is more efficient than insertWithKey?
And what is that `W` ?

'W' is a maximum bound (see comment at top of that haddock page).

Where?
And the top of that page there is only a link to a paper where the algorithm is defined.

'alter' has some shortcuts if the update functions doesn't actually do
anything, but for reimplementing 'insertWithKey', the complexity should be the same as the code would be the same.


Ok.

But piling up appendUs is still not a good idea. For a moment,
I thought that the stream representation's (+++) was handling
runtime fusion gracefully, but a simple test case suggests otherwise
at least for the simpler case of consU (the attached appendU.hs
doesn't do any appendUs, as the consU case already demonstrates
the issue; be careful with large numbers here, it'll quickly eat your ram):

I'm not sure to fully understand the code.

[ Ok ]
>
But, again, IMHO it does not apply to my original problem.

It is a question of resource constraints, probably. Instead of maintaining
an 'IntMap (UArr Int)' all the way through buildup, you could use an
'IntMap [Int]' while parsing, and then compress into 'IntMap (UArr Int)'
when parsing is finished.

No, as I have written, this is not possible for my problem.
Because this would eat all available memory after few minutes.

The data I'm parsing from Netflix Prize data set contains:
17770 movies
480189 customers (with customer ids ranging from 1 to 2649429)
100480507 total ratings


ratings are grouped for each movie, but I want to group them by costumers.
This means that each customer has few ratings (200, on average), so there are a lot of small arrays.

Since ratings for each customers are parsed "at the same time", using a plain list would consume a lot of memory, since stream fusion can only be executed at the end of the parsing.

On the other hand, when I want to group ratings by movies, stream fusion seems to work fine.

This is IMHO, of course.

If you can afford the larger memory profile
during parsing, this should give better overall performance, with the
same memory need for the final state, but larger memory needs up
to that state. I'd be interested in the differences, if you try it.


Unfortunately, I can not afford this memory profile!
My PC has only 2 GB of RAM.

And I doubt any other could afford the required memory.
Using UArr, the total required memory is about 900 MB.



Thanks  Manlio
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to