On Wednesday, 6 February 2013 at 10:43:02 UTC, bioinfornatics wrote:
instead to call mmFile opIndex to read ubyte by ubyte i tried to put into a buffer array of length PAGESIZE.

code here: http://dpaste.dzfl.pl/25ee34fc

and is not faster for 12Go to parse i need 11 minutes. I do not see how i could read faster the file!

To remember fastxtoolkit need 2 min!

This might be stupid, but I see a "writeln" in your inner loop. You aren't slowed down just by your console by any chance?

If I were you, I'd start benching to try and see who is slowing you down.

I'd reorganize the code to parse a file that is, say 512Mb. The rationale being you can place it entirely at once. Then, I'd shift the logic from "fully proccess each charater before moving to the next character" to "make a full processing pass on the entire data structure, before moving to the next pass".

The steps I see that need to be measured are:

* Raw read of file
* Iterating on your file to extract it as a raw array of "Data" objects
* Processing the Data objects
* Outputting the data

Also, (of course), you need to make sure you are compiling in release (might sound obvious, but you never know). Are you using dmd? I heard the "other" compilers are faster.

I'm going to try and see with some example files if I can't get something running faster.

Reply via email to