Am Montag 19 April 2010 17:53:27 schrieb Arnoldo Muller: > Hello Daniel: > > My % GC time is : 75.0% (81.4% elapsed) and I am compiling with -O2.
Very bad. Can I see the code? > Thank you for clarifying about the pointers. Not to forget the Ints for counting. > > Slowly my memory grows up and eventually it explodes. I would expect > that the list comprehension is lazily evaluated and therefore at any > given time I am only executing one hamming distance. The result of the > hamming distance is stored into a small statistics datatype I built > (only stores sums and sum of squares and the counts). This datatype is > updated using a foldr. That might very well be the problem, if you update it with a foldr, you must construct the entire list of 2000^2 hamming-thunks before the work can begin. It's probably better to use foldl' (and make the type strict) so you can start the work immediately. > > I have no idea where the leak is. What do you see in a .prof file to > find a leak (hamming distance has the largest amount of time and %alloc) For finding leaks, heap profiling (-h*) gives more info than -p. The .prof says more about where you spend your time than what hangs on to memory. > ? From my .prof file where would you start looking at? - use hamming instead of hamming2 - convertIntToDouble looks suspicious - calculating a few million Hamming distances takes some time, but what about getMyStats, should that really take 25%? > > > filter (\x -> x /= 0) $ map (uncurry hammingX) [(xs, ys) | xs <- > > > exampl, ys <- exampl] > > > filter (/= 0) [hamming xs ys | xs <- example, ys <- example] And of course, you can trivially avoid half of the work. > > > Best Regards, > > Arnoldo Muller > > On Mon, Apr 19, 2010 at 3:18 AM, Daniel Fischer <daniel.is.fisc...@web.de>wrote: > > Am Montag 19 April 2010 01:03:14 schrieb Arnoldo Muller: > > > Hello all: > > > > > > I want to generate some hamming distance statistics about a set of > > > strings. As explained in another e-mail in this list, I used the > > > following code to call the > > > functions: > > > (exampl holds the list of strings of size w) > > > filter (\x -> x /= 0) $ map (uncurry hammingX) [(xs, ys) | xs <- > > > exampl, ys <- exampl] > > > > > > I have two hamming functions: > > > -- hamming distance for variable length strings > > > hamming :: String -> String -> Int > > > hamming x y = hamming' x y 0 > > > where > > > hamming' [] _ !c = c > > > hamming' _ [] !c = c > > > hamming' (x:xs) (y:ys) !c > > > > > > | x == y = hamming' xs ys c > > > | otherwise = hamming' xs ys (c + 1) > > > > > > -- function posted in this mailing list > > > hamming2 :: String -> String -> Int > > > hamming2 xs ys = length (filter not (zipWith (==) xs ys)) > > > > > > I am executing these functions millions of times and the bottleneck > > > of my program is in them as explained by running in profiling mode > > > with +RTS -K400M -p -RTS > > > > > > The costlier function is the hamming distance > > > COST CENTRE MODULE %time %alloc > > > > > > hamming Distances 66.6 41.9 > > > > > > It says that it is performing 41% of the allocations. In the case of > > > hamming2 the allocations go as far as 52%. > > > > Allocations are cheap, so that's not necessarily a problem. More > > important is, what's the maximum residency and how much is copied > > during GC? Are you compiling with -O2 ? > > > > > I could understand that > > > there are allocations in "hamming2" because we are creating pairs, > > > but in the case of "hamming" there should be no allocation. > > > > Why not? I don't know how GHC counts allocations, but everytime you go > > from (x:xs) to xs, you need a new pointer to the tail. If that counts > > as allocation, hamming must allocate a lot, too. > > > > > How can I execute my hamming functions without allocating memory? > > > > > > Best regards, > > > > > > Arnoldo Muller _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe