Hi Ken,

I'd suggest to use ByteString instead of String with Sequence at all times
- it is much more efficient, and uses less memory.
Just:
import Data.ByteString.Char8 as BS
and instead of findKMers 20 . toString, rewrite findKMersBS 20 with the
following sig:
findKMers :: Int -> [BS.ByteString] -> [BS.ByteString]
findKmers :: Int -> BS.ByteString -> [BS.ByteString]
findKmers k xs = findKmers' n k xs
  where n = BS.length xs - k + 1
        findKmers' n' k' xs'
          | n' > 0 = BS.take k' xs' : findKmers' (n' - 1) k' (BS.tail xs')
          | otherwise = []

This step critically decreased amount of used memory in my case, letting
the code to finish in 3 mins.

real 3m11.755s
user 1m46.364s
sys 1m25.280s

Of course now it take 50% of the time opening and closing files... :-)
--
  Cheers
    Michal

On Sun, Jul 26, 2015 at 11:34 AM, Youens-Clark, Charles Kenneth - (kyclark)
<kycl...@email.arizona.edu> wrote:

> On Jul 25, 2015, at 10:43 AM, Nicholas Ingolia <n...@ingolia.org> wrote:
> >
> > I would suggest opening all files at the beginning. Generate all k-mers,
> > open a file for each, and make a HashMap from k-mer to file handle. When
> > you're done, run through the HashMap to close all file handles.
>
> I really like this idea and feel I'm close to having something that
> works.  Here's a bit:
>
> main = do
>   reads <- readFasta "test.fa"
>   let kmers       = concatMap (findKmers 20 . toString . seqdata) reads
>   let allMers     = replicateM 5 "ACTG"
>   let fileHandles = Map.fromList $
>                     map (\x -> (x, openFile ("out/" ++ x) WriteMode))
> allMers
>
>   mapM_ (printMer fileHandles) kmers
>
>   mapM_ hClose $ Map.elems fileHandles
>
> Here "fileHandles" type is:
>
> fileHandles :: Map.Map [Char] (IO Handle)
>
> But it would be much easier if the elems were just Handle's.  Is there a
> way to do this?  I tried this:
>
>   let fileHandles = Map.fromList $
>                     map (\x -> do h <- openFile ("out/" ++ x) WriteMode
>                                   (x, h))
>                     allMers
>
> I'm still really hung up on the whole IO monad thing.
>
> Ken




-- 
  Pozdrawiam
    Michał

Reply via email to