On Wed, Nov 30, 2011 at 9:54 PM, Christian Höner zu Siederdissen <choe...@tbi.univie.ac.at> wrote: > I'll give an extremely simple iteratee-based parser a shot on parsing > Rfam 10.1 full. It contains at least to huge alignments (tRNA has > 1 000 000 sequences, i think) and SSU-rRNA is probably bad as well. If I > can keep the memory consumption slightly above what that is in bytes, > I'll let you know and we can consider extending that...
biostockholm successfully parsed Rfam 9.1's tRNA using 900 MiB of memory. Good luck implementing something "extremely simple" that reads Stockholm files =). > Iteratees could be helpful as one can discard everything from memory > that is not explicitly kept -- of course for the test I'll fake "full > parsing" of individual families. > > But I would not have expected to see such bad memory behaviour as you > are using lazy bytestrings. Maybe putting in "ByteString.copy" would > help when creating the individual sequences, making sure that the input > stream can be completely garbage collected. I've tried doing this and the memory usage got worse (besides taking more time). Actuallly, the whole Rfam 9.1 full file is less than 2 GiB uncompressed, so I don't think this is the issue. I'd need to do some heap profiles to identify the culprit. Cheers, -- Felipe. _______________________________________________ Biohaskell mailing list Biohaskell@biohaskell.org http://malde.org/cgi-bin/mailman/listinfo/biohaskell