Hi,
(Quick background: I'm used to Perl and have been trying to start learning Haskell a number of times, using every book I could lay my hand on, over the years, but never succeeded, maybe because I needed something "real" to use it for. The main problem for me is usually going from the nice, abstract concepts and into "making it work" in real life. I don't know the idioms of the language...) I just had a go at using (Bio)Haskell to beat a simple C-program that reads through a fastq-file and simply counts the number of sequences, adds up the total length, and outputs those figures along with the average length. I cobbled together this program using Google, Hoogle, documentation and guessing: import System.Environment import Bio.Sequence.FastQ import Bio.Core.Sequence main = do [f] <- getArgs putStrLn . output . average . foldl stats (0, 0) =<< readIllumina f where stats (count, totalLength) s = (count+1, totalLength+toInteger(seqlength s)) average (count, totallength) = (count, totallength, t/c) where t = fromIntegral totallength :: Float c = fromIntegral count :: Float output (count, length, average) = "Count " ++ show count ++ "\n" ++ "Total " ++ show count ++ " records " ++ show length ++ " length " ++ show average ++ " average" Very simple, and surely not quite the way someone fluent in the language would write it. On my test-example, which was a fastq file with ~200K sequences of 50 bp length each, the Haskell program beat the C program by a factor of 2+. Nice! (I'm speculating that the penalty for explicit memory handling is a part of the difference, k*200K malloc+free calls must take some time...) I have two questions: a) If anyone has time to suggest how this small program would be written by someone fluent in Haskell, I would love to read and learn. b) If I run my program on a big file, the I get "Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize -RTS' to increase it." So I guess I am doing something that means that the entire file gets read into memory at the same time - any pointers on reasoning about this and fixing this are very welcome as well. Is this something like I am fold'ing in the wrong direction? I hope that using Haskell for some "real life" stuff will make it easier for me to get into it this time around. Best regards, Adam P.S. I tried sending this back in March, but it doesn't seem to have gotten through, and I got distracted. Apologies if you have seen it before. -- "You've got to be excited about what you are doing." Adam Sjøgren a...@koldfront.dk