On Wed, Nov 30, 2011 at 9:54 PM, Christian Höner zu Siederdissen
<choe...@tbi.univie.ac.at> wrote:
> I'll give an extremely simple iteratee-based parser a shot on parsing
> Rfam 10.1 full. It contains at least to huge alignments (tRNA has
> 1 000 000 sequences, i think) and SSU-rRNA is probably bad as well. If I
> can keep the memory consumption slightly above what that is in bytes,
> I'll let you know and we can consider extending that...

biostockholm successfully parsed Rfam 9.1's tRNA using 900 MiB of memory.

Good luck implementing something "extremely simple" that reads
Stockholm files =).

> Iteratees could be helpful as one can discard everything from memory
> that is not explicitly kept -- of course for the test I'll fake "full
> parsing" of individual families.
>
> But I would not have expected to see such bad memory behaviour as you
> are using lazy bytestrings. Maybe putting in "ByteString.copy" would
> help when creating the individual sequences, making sure that the input
> stream can be completely garbage collected.

I've tried doing this and the memory usage got worse (besides taking
more time).  Actuallly, the whole Rfam 9.1 full file is less than 2
GiB uncompressed, so I don't think this is the issue.  I'd need to do
some heap profiles to identify the culprit.

Cheers,

-- 
Felipe.
_______________________________________________
Biohaskell mailing list
Biohaskell@biohaskell.org
http://malde.org/cgi-bin/mailman/listinfo/biohaskell

Reply via email to