Christian Höner writes:

> welcome ;-)

Thanks!

> What you still need to do is to make @stats@ and @foldl@ more strict.
> For this you should check out @foldl'@ (note the prime) and bang
> patterns for @stats@. I'm intentionally not giving the solution!

Ah, thanks for the hints - especially the second one; I was suspecting
the first :-)

> Once you have done that the program should work with arbitrarily long
> inputs.

Indeed, this is what I changed it into:

  {-# LANGUAGE BangPatterns #-}

  import System.Environment
  import Data.List
  import Bio.Sequence.FastQ
  import Bio.Core.Sequence

  main = do
    [f] <- getArgs
    putStrLn . output . average . foldl' stats (0, 0) =<< readIllumina f
      where stats (!count, !totalLength) s = (count+1, 
totalLength+toInteger(seqlength s))

  average (count, totallength) = (count, totallength, t/c)
    where t = fromIntegral totallength :: Float 
          c = fromIntegral count :: Float

  output (count, length, average) = "Count " ++ show count ++ "\n" ++ "Total " 
++ show count ++ " records " ++ show length ++ " length " ++ show average ++ " 
average"

And now it works fine on a fastq-file of 5.1 GB on my desktop with 16GB
RAM.

Let me see if I can find a bigger fastq-file to run it on... Yes, it
works fine on the 33 GB file I just located.

Thanks for the tips!

Do you gradually get an intuitive feeling for when strictness is
"necessary", is it something you'll handle when running into a problem,
or do you do measurements?


  Best regards,

    Adam

-- 
 "A cat has nine lives, but a bullfrog croaks                 Adam Sjøgren
  every day."                                            a...@koldfront.dk

Reply via email to