Re: How to read fastly files ( I/O operation)

monarch_dodra Mon, 04 Feb 2013 11:40:18 -0800

On Monday, 4 February 2013 at 19:30:59 UTC, Dejan Lekic wrote:

FG wrote:
On 2013-02-04 15:04, bioinfornatics wrote:
I am looking to parse efficiently huge file but i think Dlacking for thispurpose. To parse 12 Go i need 11 minutes wheras fastxtoolkit(written in c++
) need 2 min.
My code is maybe not easy as is not easy to parse a fastqfile and is more
harder when using memory mapped file.
Why are you using mmap? Don't you just go through the filesequentially?
In that case it should be faster to read in chunks:

     foreach (ubyte[] buffer; file.byChunk(chunkSize)) { ... }
I would go even further, and organise the file so N Dataobjects fit one page,and read the file page by page. The page-size can easily beobtained from the
system. IMHO that would beat this fastxtoolkit. :)

AFAIK, he is reading text data that needs to be parsed line byline, so byChunk may not be the best approach. Or at least, notthe easiest approach.

I'm just wondering if maybe the reason the D code is slow is notjust because of:

- unicode.
- front + popFront.

ranges in D are "notorious" for being slow to iterate on text,due to the "double decode".

If you are *certain* that the file contains nothing but ASCII(which should be the case for fastq, right?), you can get morebang for your buck if you attempt to iterate over it as an arrayof bytes, and convert the bytes to char on the fly, bypassing anyand all unicode processing.

Re: How to read fastly files ( I/O operation)

Reply via email to