Re: How to read fastly files ( I/O operation)

monarch_dodra Fri, 08 Feb 2013 06:30:17 -0800

On Friday, 8 February 2013 at 09:08:48 UTC, bioinfornatics wrote:

And use size_t instead to int for getChar/getInt method as typereturned
gdmd -w -O -release monarch.d
~ $ time ./monarch/env/cns/proj/projet_AZH/A/RunsSolexa/121114_FLUOR_C16L5ACXX/AZH_AOSC_8_1_C16L5ACXX.IND1_clean.fastq
globalStats:
A: 1007129068. C: 1350576504. G: 1353023772. M:   0. D:   0. S:
0. H: 0. N: 39413. V: 0. U: 0. W: 0. R: 0. B: 0.Y: 0. K: 0. T: 999786820.
time: 176585

real    2m56.635s
user    2m31.376s
sys     0m23.077s


this program is little less fast than f's program

I've re-tried running both mine and FG's on a HDD based machine,with dmd, -O -release. Also optional "inline"

I also wrote a new parser, which does as FG suggested, and justparses straight up (byLine is indeed more expensive). This onehandles whites and line breaks correctly. It also accepts linesof any size (the internal buffer is auto-grow).


My results are different from yours though:

        w/o inline  w inline
FG      105s        77s
MD       72s        64s
newMD    61s        59s

I have no idea why you guys are getting better results with FG,and I'm getting better results with mine. Is this a win/linux ordmd/gdc issue. My new parser is based on raw reads, so thatshould be much faster on your machines.

about parser I would like create a set a biology parser and putinto a lib with a set of common compute as letter counter.By example you could run a letter counter compute throw a fataor fastq file.
rename identifier thwow a fata or fastq file.


I don't really understand what all that means.

In any case, I've been able to implement some cool features sofar. My parser is a "true" range you can pass around, and youwon't have any problems with it.

It returns "shallow" objects that reference a mutable string,however, the user can call "dup" or "idup" to have a new object.

Said objects can be printed directly, so there is no need for aspecialized "writer". As a matter of fact, this little programwill allow you to "clean" a file (strip spaces), and potentially,line-wrap at 80 chars:


//----
import std.stdio;

import fastq.parser;
import fastq.q;

void main(string[] args)
{
    Parser parser = new Parser(args[1]);
    File   output = File(args[2], "wb");
    foreach(entry; parser)
        writefln("%80s", entry);
}
//----

I'll submit it for your review, once it is perfectly implemented.

Re: How to read fastly files ( I/O operation)

Reply via email to