Hello Daniel: Thanks! I employed mapM'_ but I am still getting the space leak. Any other hint?
Arnoldo On Wed, Mar 10, 2010 at 10:40 PM, Daniel Fischer <daniel.is.fisc...@web.de>wrote: > Am Mittwoch 10 März 2010 21:45:56 schrieb Arnoldo Muller: > > Hello, > > > > I am learning haskell and I found a space leak that I find difficult to > > solve. I've been asking at #haskell but we could not solve > > the issue. > > > > I want to lazily read a set of 22 files of about 200MB each, filter them > > and then I want to output the result into a unique file. > > If I modify the main function to work only with one input file, the > > program runs without issues. I will call this version A. > > Version B uses a mapM_ to iterate over a list of filenames and uses > > appendFile to output the result of filtering each file. > > In this case the memory usage grows sharply and quickly (profiles show > > constant memory growth). In less than a minute, memory > > occupation will make my system hang with swapping. > > No work is been done until the end, when all is tried to be done > simultaneously. Make sure genomeExecute ... input1 has actually finished > its work before genomeExecute ... input2 starts etc. > > One way is to use a stricter version of sequence_, > > sequence'_ :: Monad m => [m a] -> m () > sequence'_ (x:xs) = do > a <- x > a `seq` sequence'_ xs > sequence'_ [] = return () > > (nicer with BangPatterns, but not portable), and > > mapM'_ f = sequence'_ . map f > > Another option is making genomeExecute itself stricter. > > > > > This is version B: > > > > ------------------------------- Program B > > ------------------------------------------------------------------------ > >-------------------------------------------- import Data.List > > import System.Environment > > import System.Directory > > import Control.Monad > > > > > > -- different types of chromosomes > > data Chromosome = C1 > > > > | C2 > > | C3 > > | C4 > > | C5 > > | C6 > > | C7 > > | C8 > > | C9 > > | C10 > > | C11 > > | C12 > > | C13 > > | C14 > > | C15 > > | C16 > > | C17 > > | C18 > > | C19 > > | CX > > | CY > > | CMT > > > > deriving (Show) > > -- define a window > > type Sequence = [Char] > > -- Window data > > data Window = Window { sequen :: Sequence, > > chrom :: Chromosome, > > pos :: Int > > } > > -- print a window > > instance Show Window where > > show w = (sequen w) ++ "\t" ++ show (chrom w) ++ "\t" ++ show (pos > > w) > > > > -- Reading fasta files with haskell > > > > -- Initialize the > > main = do > > -- get the arguments (intput is > > [input, output, windowSize] <- getArgs > > -- get directory contents (only names) > > names <- getDirectoryContents input > > -- prepend directory > > let fullNames = filter isFastaFile $ map (\x -> input ++ "/" ++ > > x) names > > let wSize = (read windowSize)::Int > > -- process the directories > > mapM (genomeExecute output wSize filterWindow) fullNames > > > > > > -- read the files one by one and write them to the output file > > genomeExecute :: String -> Int -> (Window -> Bool) -> String -> IO () > > genomeExecute outputFile windowSize f inputFile = do > > fileData <- readFile inputFile > > appendFile outputFile $ fastaExtractor fileData windowSize f > > >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe