On Friday, 8 February 2013 at 09:08:48 UTC, bioinfornatics wrote:
And use size_t instead to int for getChar/getInt method as type returned

gdmd -w -O -release monarch.d
~ $ time ./monarch /env/cns/proj/projet_AZH/A/RunsSolexa/121114_FLUOR_C16L5ACXX/AZH_AOSC_8_1_C16L5ACXX.IND1_clean.fastq
globalStats:
A: 1007129068. C: 1350576504. G: 1353023772. M:   0. D:   0. S:
0. H: 0. N: 39413. V: 0. U: 0. W: 0. R: 0. B: 0. Y: 0. K: 0. T: 999786820.
time: 176585

real    2m56.635s
user    2m31.376s
sys     0m23.077s


this program is little less fast than f's program

I've re-tried running both mine and FG's on a HDD based machine, with dmd, -O -release. Also optional "inline"

I also wrote a new parser, which does as FG suggested, and just parses straight up (byLine is indeed more expensive). This one handles whites and line breaks correctly. It also accepts lines of any size (the internal buffer is auto-grow).

My results are different from yours though:

        w/o inline  w inline
FG      105s        77s
MD       72s        64s
newMD    61s        59s

I have no idea why you guys are getting better results with FG, and I'm getting better results with mine. Is this a win/linux or dmd/gdc issue. My new parser is based on raw reads, so that should be much faster on your machines.

about parser I would like create a set a biology parser and put into a lib with a set of common compute as letter counter. By example you could run a letter counter compute throw a fata or fastq file.
rename identifier thwow a fata or fastq file.

I don't really understand what all that means.

In any case, I've been able to implement some cool features so far. My parser is a "true" range you can pass around, and you won't have any problems with it.

It returns "shallow" objects that reference a mutable string, however, the user can call "dup" or "idup" to have a new object.

Said objects can be printed directly, so there is no need for a specialized "writer". As a matter of fact, this little program will allow you to "clean" a file (strip spaces), and potentially, line-wrap at 80 chars:

//----
import std.stdio;

import fastq.parser;
import fastq.q;

void main(string[] args)
{
    Parser parser = new Parser(args[1]);
    File   output = File(args[2], "wb");
    foreach(entry; parser)
        writefln("%80s", entry);
}
//----

I'll submit it for your review, once it is perfectly implemented.

Reply via email to