Re: Speeding up text file parser (BLAST tabular format)

Edwin van Leeuwen via Digitalmars-d-learn Mon, 14 Sep 2015 06:16:06 -0700

On Monday, 14 September 2015 at 12:50:03 UTC, Fredrik Boulundwrote:

On Monday, 14 September 2015 at 12:44:22 UTC, Edwin van Leeuwenwrote:
Sounds like this program is actually IO bound. In that case Iwould not expect a really expect an improvement by using D.What is the CPU usage like when you run this program?
Also which dmd version are you using. I think there were someperformance improvements for file reading in the latestversion (2.068)
Hi Edwin, thanks for your quick reply!
I'm using v2.068.1; I actually got inspired to try this outafter skimming the changelog :).
Regarding if it is IO-bound. I actually expected it would be,but both the Python and the D-version consume 100% CPU whilerunning, and just copying the file around only takes a fewseconds (cf 15-20 sec in runtime for the two programs). There'sbound to be some aggressive file caching going on, but I figurethat would rather normalize program runtimes at lower timesafter running them a few times, but I see nothing indicatingthat.


Two things that you could try:

First hitlists.byKey can be expensive (especially if hitlists isbig). Instead use:


foreach( key, value ; hitlists )

Also the filter.array.length is quite expensive. You could usecount instead.

import std.algorithm : count;
value.count!(h => h.pid >= (max_pid - max_pid_diff));

Re: Speeding up text file parser (BLAST tabular format)

Reply via email to