Am Montag 19 April 2010 14:13:53 schrieb Heinrich Apfelmus: > Arnoldo Muller wrote: > > I want to generate some hamming distance statistics about a set of > > strings. > > > > filter (\x -> x /= 0) $ > > map (uncurry hammingX) [(xs, ys) | xs <- exampl, ys <- exampl] > > > > [...] > > > > -- function posted in this mailing list > > hamming2 :: String -> String -> Int > > hamming2 xs ys = length (filter not (zipWith (==) xs ys)) > > > > I am executing these functions millions of times and the bottleneck of > > my program is in them as explained by running in profiling mode with > > +RTS -K400M -p -RTS > > > > The costlier function is the hamming distance > > COST CENTRE MODULE %time %alloc > > > > hamming Distances 66.6 41.9 > > Another way to look at it is that you shouldn't optimize hamming > itself, but rather make sure that it's called less often! > > For instance, your expression can be replaced by > > filter (/=0) [hammingX x y | (x:xs) <- tails example, y <- xs] > > which cuts the total running time in half. It's still quadratic in the > length of example . I'm sure there are faster algorithms out there that
If it's likely that there are many repetitions, collect the Strings in a Set/Map (depending on whether you're interested in the count) and call hamming only on the distinct pairs. > can bring it down to O(n log n) if you want. I don't think so. You can't calculate the Hamming distance of x and z from the distances between x and y and y and z, so you'd have to consider all pairs of distinct strings, wouldn't you? > > > Regards, > Heinrich Apfelmus _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe