Re: How to read fastly files ( I/O operation)

monarch_dodra Wed, 06 Feb 2013 08:10:21 -0800

On Wednesday, 6 February 2013 at 15:40:39 UTC, bioinfornaticswrote:

It seem in any case is not easy to parse fastly a file in D

I don't think that's true. D provides the same "FILE" primitiveyou'd get in C, so there is no reason for it to be slower than C.

It is the "range" approach that, as convenient as it is, is notwell adapted for certain things.

As I had said, I tried to write my own program. In it, I deviseda range that, instead of exposing things to parse character bycharacter, parses an entire "object" (a ... "genome" ... maybe ?I called them "Q" in my program) at once into an object. Idecided to use the very simple "byLine" primitive.

From there, you can query the object for theirname/sequence/quality. The irony is that by "parsing twice" (onceto do the io read, once to do the actual processing of the text),and taking into account I'm allocating each object individually,I'm running twice as fast as my original already improvedimplementation. Not only is it faster, it is also moreconvenient, since you can extract an entire Q object at once, andthen operate on that as you would so please: Separation ofalgorithm and parsing.

It correctly takes into account that a sequence can be multiplelines. It does not strip whitespace because according tohttp://maq.sourceforge.net/fastq.shtml whitespace is not a legalcharacter.

Now: Keep in mind that this approach allocates (3) new stringsfor each Q. You could *try* an approach with a pre-allocatedre-useable buffer. This would mean you can only operate on 1 Q atonce, but you'd probably iterate on them faster.


In any case, you can try it out:
http://dpaste.dzfl.pl/8bdd0c84

Re: How to read fastly files ( I/O operation)

Reply via email to