It looks like you're opening the files in printMer but not also closing them. The second mapM_ opens them again and then immediately closes them, which doesn't seem very useful.
On Monday, July 27, 2015, Youens-Clark, Charles Kenneth - (kyclark) < kycl...@email.arizona.edu> wrote: > On Jul 25, 2015, at 10:43 AM, Nicholas Ingolia <n...@ingolia.org > <javascript:;>> wrote: > > > > I would suggest opening all files at the beginning. Generate all k-mers, > > open a file for each, and make a HashMap from k-mer to file handle. When > > you're done, run through the HashMap to close all file handles. > > > > Also, this lets you keep your sequences/k-mers/hash keys as byte strings > > for iterating reads, and convert only during the initial file opening > > pass. > > I've tried to implement most of these suggestions in the code below except > for the one to hash all the k-mers first because I'm working with > multi-gigabyte files. I can't fit that much data into memory. Mostly this > seems like a great approach, but, when I run it, I get this exception: > > *** Exception: out/TGAAC: openFile: resource busy (file is locked) > > Is this because the map parallelizes the file writing such that there's a > race condition for the file handles? > > Ken > > import Bio.Core.Sequence > import Bio.Sequence.Fasta > import Control.Monad > import qualified Data.ByteString.Lazy.Char8 as B > import qualified Data.HashMap as Map > import System.IO > import System.Environment > > main :: IO () > main = do > [f] <- getArgs > reads <- readFasta f > let kmers = concatMap (findKmers 20 . unSD . seqdata) reads > let allMers = replicateM 5 "ACTG" > let fileHandles = Map.fromList $ > map (\x -> (x, openFile ("out/" ++ x) WriteMode)) > allMers > > mapM_ (printMer fileHandles) kmers > mapM_ (\ioh -> do { h <- ioh; hClose h }) $ Map.elems fileHandles > putStrLn "Done." > > -- # -------------------------------------------------- > findKmers :: Integer -> B.ByteString -> [B.ByteString] > findKmers k xs = findKmers' n k xs > where n = toInteger (B.length xs) - k + 1 > findKmers' n' k' xs' > | n' > 0 = B.take (fromIntegral k') xs' > : findKmers' (n' - 1) k' (B.tail xs') > | otherwise = [] > > -- # -------------------------------------------------- > printMer :: Map.Map String (IO Handle) -> B.ByteString -> IO () > printMer fileHandles mer = do > let bin = toString (B.take 5 mer) > let handle = Map.lookup bin fileHandles > case handle of > Nothing -> putStrLn $ "Missing " ++ bin ++ " handle" > Just ioh -> do h <- ioh > B.hPutStrLn h (B.drop 5 mer)