Hi Ken, I'd suggest to use ByteString instead of String with Sequence at all times - it is much more efficient, and uses less memory. Just: import Data.ByteString.Char8 as BS and instead of findKMers 20 . toString, rewrite findKMersBS 20 with the following sig: findKMers :: Int -> [BS.ByteString] -> [BS.ByteString] findKmers :: Int -> BS.ByteString -> [BS.ByteString] findKmers k xs = findKmers' n k xs where n = BS.length xs - k + 1 findKmers' n' k' xs' | n' > 0 = BS.take k' xs' : findKmers' (n' - 1) k' (BS.tail xs') | otherwise = []
This step critically decreased amount of used memory in my case, letting the code to finish in 3 mins. real 3m11.755s user 1m46.364s sys 1m25.280s Of course now it take 50% of the time opening and closing files... :-) -- Cheers Michal On Sun, Jul 26, 2015 at 11:34 AM, Youens-Clark, Charles Kenneth - (kyclark) <kycl...@email.arizona.edu> wrote: > On Jul 25, 2015, at 10:43 AM, Nicholas Ingolia <n...@ingolia.org> wrote: > > > > I would suggest opening all files at the beginning. Generate all k-mers, > > open a file for each, and make a HashMap from k-mer to file handle. When > > you're done, run through the HashMap to close all file handles. > > I really like this idea and feel I'm close to having something that > works. Here's a bit: > > main = do > reads <- readFasta "test.fa" > let kmers = concatMap (findKmers 20 . toString . seqdata) reads > let allMers = replicateM 5 "ACTG" > let fileHandles = Map.fromList $ > map (\x -> (x, openFile ("out/" ++ x) WriteMode)) > allMers > > mapM_ (printMer fileHandles) kmers > > mapM_ hClose $ Map.elems fileHandles > > Here "fileHandles" type is: > > fileHandles :: Map.Map [Char] (IO Handle) > > But it would be much easier if the elems were just Handle's. Is there a > way to do this? I tried this: > > let fileHandles = Map.fromList $ > map (\x -> do h <- openFile ("out/" ++ x) WriteMode > (x, h)) > allMers > > I'm still really hung up on the whole IO monad thing. > > Ken -- Pozdrawiam Michał