Olaf writes: > your program looks almost exaclty how I'd write it, expect for the > foldl' Christian mentioned.
Nice to hear! It is very simple, as you say, so maybe that's also why I'm not that far off. > I also doubt that the Haskell program can really outperform a > well-written C program on such a simple task. I agree. But the C-program I am taking on, as it were, is not really well-written. For one thing, it does malloc()/free() for every line. (Oh, and it doesn't handle big numbers, it overflows without detecting it :-)) So I am cheating, by having my program using a probably quite well-written runtime against a more-or-less naïve C-implementation. When the time is dominated by disk-access, the timings are very close (C first, then Haskell): $ for f in small_29M.fastq large_5G.fastq huge_33G.fastq; do time fastqstats $f; done Count 199957 Total 199957 records 9997850 length 50 average real 0m0.129s user 0m0.098s sys 0m0.000s Count 10085674 Total 10085674 records -1893163715 length -187.708 average real 0m19.975s user 0m8.335s sys 0m1.841s Count 63074335 Total 63074335 records -143886218 length -2.28122 average real 2m7.448s user 0m56.549s sys 0m10.825s $ for f in small_29M.fastq large_5G.fastq huge_33G.fastq; do time hfastqstats $f; done Count 199957 Total 199957 records 9997850 length 50.0 average real 0m0.120s user 0m0.048s sys 0m0.015s Count 10085674 Total 10085674 records 2401803581 length 238.1401 average real 0m19.911s user 0m4.276s sys 0m2.120s Count 63074335 Total 63074335 records 12741015670 length 202.0 average real 2m11.627s user 0m31.264s sys 0m13.468s $ So what happens when the disk-cache is hot? I only have 16 GB RAM in my desktop, so I'll exclude the 33 GB file, and run the two programs a number of times. After 10 runs of each, I get these numbers (C first again, then Haskell): 11 fastqstats Count 199957 Total 199957 records 9997850 length 50 average real 0m0.097s user 0m0.097s sys 0m0.000s Count 10085674 Total 10085674 records -1893163715 length -187.708 average real 0m8.681s user 0m7.979s sys 0m0.696s hfastqstats Count 199957 Total 199957 records 9997850 length 50.0 average real 0m0.066s user 0m0.062s sys 0m0.004s Count 10085674 Total 10085674 records 2401803581 length 238.1401 average real 0m3.904s user 0m3.212s sys 0m0.688s $ which is kind of fun. > In my eyes, the strength of Haskell is hidden in the readIllumina > function: Bioinformatics is 50% parsing and converting text formats. That's also why I like BioPerl a lot - some one else did the parsing for for me :-) Thanks for the comments. Best regards, Adam -- "No more than that, but very powerful all the same; Adam Sjøgren simple things are good." a...@koldfront.dk