On Tue, Nov 29, 2011 at 10:32 PM, Christian Höner zu Siederdissen <choe...@tbi.univie.ac.at> wrote: > how much interest is there for iteratee-based fasta reading? Has someone > already written something?
I don't know. While it would be nice, currently that's not something that I need myself. > Since iteratee- (or enumerator-based parsing in general) is strict in > its output, there are some considerations regarding large files. On the > other hand, sometime in early 2012 I'll probably provide a library to > efficiently handle tasks on large sequence-based files. What do you mean by "strict in its output"? Do you mean that each sequence of the FASTA file would need to be held in memory? I guess there are two different FASTA readers possible, depending on if the stream is based on (just examples) data FastaSeq = FastaSeq SeqLabel SeqData or data FastaItem = FastaLabel SeqLabel | FastaData SeqData Using FastaSeq you get a simple-to-use interface that needs to hold each sequence in memory. Using FastaItem you get something like a SAX parser where the stream may be consumed in constant memory usage (something like [FastaLabel ..., FastaData ..., FastaData ..., FastaData ...] where each data chunk is of a limited size), but where it's a little bit more difficult to write programs. Assuming that we wrote some FASTA parser using enumeratees, I guess FastaItem is the way to go, since it's possible to have an enumeratee that converts FastaItems into FastaSeqs. Cheers, -- Felipe. _______________________________________________ Biohaskell mailing list Biohaskell@biohaskell.org http://malde.org/cgi-bin/mailman/listinfo/biohaskell