It looks like you're opening the files in printMer but not also closing
them. The second mapM_ opens them again and then immediately closes them,
which doesn't seem very useful.

On Monday, July 27, 2015, Youens-Clark, Charles Kenneth - (kyclark) <
kycl...@email.arizona.edu> wrote:

> On Jul 25, 2015, at 10:43 AM, Nicholas Ingolia <n...@ingolia.org
> <javascript:;>> wrote:
> >
> > I would suggest opening all files at the beginning. Generate all k-mers,
> > open a file for each, and make a HashMap from k-mer to file handle. When
> > you're done, run through the HashMap to close all file handles.
> >
> > Also, this lets you keep your sequences/k-mers/hash keys as byte strings
> > for iterating reads, and convert only during the initial file opening
> > pass.
>
> I've tried to implement most of these suggestions in the code below except
> for the one to hash all the k-mers first because I'm working with
> multi-gigabyte files.  I can't fit that much data into memory.  Mostly this
> seems like a great approach, but, when I run it, I get this exception:
>
> *** Exception: out/TGAAC: openFile: resource busy (file is locked)
>
> Is this because the map parallelizes the file writing such that there's a
> race condition for the file handles?
>
> Ken
>
> import Bio.Core.Sequence
> import Bio.Sequence.Fasta
> import Control.Monad
> import qualified Data.ByteString.Lazy.Char8 as B
> import qualified Data.HashMap as Map
> import System.IO
> import System.Environment
>
> main :: IO ()
> main = do
>   [f] <- getArgs
>   reads <- readFasta f
>   let kmers       = concatMap (findKmers 20 . unSD . seqdata) reads
>   let allMers     = replicateM 5 "ACTG"
>   let fileHandles = Map.fromList $
>                     map (\x -> (x, openFile ("out/" ++ x) WriteMode))
> allMers
>
>   mapM_ (printMer fileHandles) kmers
>   mapM_ (\ioh -> do { h <- ioh; hClose h }) $ Map.elems fileHandles
>   putStrLn "Done."
>
> -- # --------------------------------------------------
> findKmers :: Integer -> B.ByteString -> [B.ByteString]
> findKmers k xs = findKmers' n k xs
>   where n = toInteger (B.length xs) - k + 1
>         findKmers' n' k' xs'
>           | n' > 0 = B.take (fromIntegral k') xs'
>              : findKmers' (n' - 1) k' (B.tail xs')
>           | otherwise = []
>
> -- # --------------------------------------------------
> printMer :: Map.Map String (IO Handle) -> B.ByteString -> IO ()
> printMer fileHandles mer = do
>   let bin    = toString (B.take 5 mer)
>   let handle = Map.lookup bin fileHandles
>   case handle of
>     Nothing  -> putStrLn $ "Missing " ++ bin ++ " handle"
>     Just ioh -> do h <- ioh
>                    B.hPutStrLn h (B.drop 5 mer)

Reply via email to