On 2013-02-04 20:39, monarch_dodra wrote:

AFAIK, he is reading text data that needs to be parsed line by line, so
byChunk may not be the best approach. Or at least, not the easiest
approach.

He can still read a chunk from the file, or the whole file and then read that chunk line by line.

I'm just wondering if maybe the reason the D code is slow is not just
because of:
- unicode.
- front + popFront.

ranges in D are "notorious" for being slow to iterate on text, due to
the "double decode".

If you are *certain* that the file contains nothing but ASCII (which
should be the case for fastq, right?), you can get more bang for your
buck if you attempt to iterate over it as an array of bytes, and convert
the bytes to char on the fly, bypassing any and all unicode processing.

Depending on what you're doing you can blast through the bytes even if it's Unicode. It will of course not validate the Unicode.

--
/Jacob Carlborg

Reply via email to