Re: Dipping my toes in Haskell, via Bio.Sequence.Fastq

Christian Höner zu Siederdissen Thu, 23 Jul 2015 08:42:06 -0700

Hi Adam,

Yes, that is how it is done! You can also append +RTS -s -RTS to your
program and will see running time statistics. The 5th line is "total
memory in use" and should be only a couple of MBytes.


The real world Haskell book has a chapter on optimization. In general
you get a good intuition after a while. The basic idea is to make the
streaming part lazy enough that only a single block is allocated; then
make each calculation strict enough that "no thunks" are retained.

Haskell has a bunch of libraries that help writing efficient code for
all kinds of problems (numerics: vector, repa; streaming: conduit,
pipes, ...). This makes many things automagical.

My method of choice for improving performance is "benchmarking" via +RTS
-s -RTS to see if I'm leaking space, or totally screwed up the algorithm
design. Then I hunt for the usual stuff: strictness annotation. If that
fails I'll just read the intermediate core language -- but that is only
necessary for the kind of programs I write; you won't need the last
part. ;-)

Viele Gruesse,
Christian

* Adam Sjøgren <a...@koldfront.dk> [23.07.2015 17:02]:
> Indeed, this is what I changed it into:
> 
>     putStrLn . output . average . foldl' stats (0, 0) =<< readIllumina f
>       where stats (!count, !totalLength) s = (count+1, 
> totalLength+toInteger(seqlength s))
> 
> And now it works fine on a fastq-file of 5.1 GB on my desktop with 16GB
> RAM.


> 
> Thanks for the tips!
> 
> Do you gradually get an intuitive feeling for when strictness is
> "necessary", is it something you'll handle when running into a problem,
> or do you do measurements?
> 
> 
>   Best regards,
> 
>     Adam
> 
> -- 
>  "A cat has nine lives, but a bullfrog croaks                 Adam Sjøgren
>   every day."                                            a...@koldfront.dk

Re: Dipping my toes in Haskell, via Bio.Sequence.Fastq

Reply via email to