Re: How to read fastly files ( I/O operation)

monarch_dodra Wed, 06 Feb 2013 04:35:14 -0800

On Wednesday, 6 February 2013 at 11:15:22 UTC, monarch_dodrawrote:

I'm going to try and see with some example files if I can't getsomething running faster.

Benchmarking and tweaking, I was able to find 3 things thatspeeds up your program:

1) Make the computeLocal a compile time constant. This will giveyou a tinsy bit of performance. Depends on if you plan to make ita run-time argument switch I guess.


2) Makes things about 10%-20% faster:

Your "nucleic" and "amino" hash tables map a character to anindex. However, given the range of the characters ('A' to 'Z'),you are better off doing a flat array, where each indexrepresents a character, eg: A is index 0, B is index 1. This way,lookup is a simple array indexing, as opposed to a hash tableindexing.

You may even get a bigger bang for your buck by simply givingyour "_stats" structure a simple "A is index 0, B is index 1",and only "re-order" the data at the end, when you want to readit. (I haven't done this though).

3) Makes things about 100% faster (ran in half the time on mymachine): I don't know how mmFile works, but a simple File +"rawRead" seems to get the job done fast. Also, instead ofkeeping track of an (several) indexes, I merely keep a singleslice. The only thing I care about, is if my slice is empty, inwhich case I re-fill it.

The modified code is here. I'm apparently getting the same outputyou are, but that doesn't mean there might not be bugs in it. Forexample, I noticed that you don't strip leading whites, if any,before the first read.

http://dpaste.dzfl.pl/9b9353b8

----

I'd be tempted to re-write the parser using a "byLine" approach,since my quick reading about fastq seems to imply it is a linebased format. Or just plain try to write a parser from scratch,putting my own logic and thought into it (all I did was modifyyour code, without caring about the actual algorithm)

Re: How to read fastly files ( I/O operation)

Reply via email to