Re: [Haskell-cafe] uvector package appendU: memory leak?

Manlio Perillo Wed, 01 Apr 2009 03:16:53 -0700

wren ng thornton ha scritto:

Manlio Perillo wrote:
Since ratings for each customers are parsed "at the same time", usinga plain list would consume a lot of memory, since stream fusion canonly be executed at the end of the parsing.
On the other hand, when I want to group ratings by movies, streamfusion seems to work fine.
[...]
For the problem as you've discussed it, I'd suggest a differentapproach: You can't fit all the data into memory at once, so youshouldn't try to. You should write one program that takes in theper-movie grouping of data and produces a per-user file as output.

Well, creating 480189 files in a directory is not a very nice thing todo to a normal file system.

I should arrange files in directory, but then this starts to become toocomplex.


The solution I'm using now just works.

It takes about 950 MB of memory and 35 minutes, but it's not a bigproblem since:

1) Once loaded, I can serialize the data in binary format
2) I think that the program can be parallelized, parsing
   subsets of the files in N threads, and then merging the maps.

   Using this method, should optimize array copying.

   The problem is that unionWith seems to be lazy, and there is no no
   strict variant; I'm not sure.

Thenhave your second program read in the reorganized data and do fusion et al.
This reduces the problem to just writing the PerMovie -> PerUserprogram. Since you still can't fit all the data into memory, that meansyou can't hope to write the per-user file in one go.


The data *do* fit into memory, fortunately.

> [...]


Best of luck.



Thanks  Manlio
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] uvector package appendU: memory leak?

Reply via email to