Re: How to read fastly files ( I/O operation)

monarch_dodra Wed, 06 Feb 2013 12:45:18 -0800

On Wednesday, 6 February 2013 at 19:19:52 UTC, FG wrote:

On 2013-02-04 15:04, bioinfornatics wrote:
I am looking to parse efficiently huge file but i think Dlacking for this purpose.To parse 12 Go i need 11 minutes wheras fastxtoolkit (writtenin c++ ) need 2 min.
Haven't compared to fastxtoolkit, but I have some code for you.
I have processed the file SRR077487_1.filt.fastq from
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG00096/sequence_read/
and expect this syntax (no multiline sequences or whitespace).
File takes up almost 6 GB processing took 1m45s - twice as fastas the
fastest D solution so far

Do you mean my solution above? I tried your solution with dmd,with -release -O -inline, and both gave about the same result(69s yours, 67s mine).

Data contains both sequence letter and associated qualityinformation.Sequence ID and comment are slices of the buffer, so they havevalid info
until you move to the next sequence (and the number increments).


Hum. Mine allocates new slices, so they are never invalidated :)
Mine also takes into account newlines and and lowercase sequences.

Still, it seems you and I both took different approaches. I hadmentioned using a re-useable buffer. I'm going to try to consumesome of your code to see if I can't improve my implementation.


@bioinfornatics

I'm getting real interested on the subject. I'm going to try towrite an actual library/framework for working with fastq files ina D environment.

This means I'll try to write robust and useable code, with bothstability and performance in mind, as opposed to the "proofs ofconcepts in so far".

For now, I'd like to keep it simple: Would something that onlyknows how to parse/write Sanger FASTQ files be of help to you?


If I write something, can I have you review it?

Re: How to read fastly files ( I/O operation)

Reply via email to