Dear all, I have implemented a small module to generate random items with a given probability distribution using the alias approach [1] and unfortunately compared to similar implementations in C++ or Java it is about 10 times slower. I have to confess that I am still in the early stages of writing Haskell programs so there is hopefully something I can do about it and I hope some helpful soul on this list can give me a hint.
I have attached my implementation and a small testing routine which runs in about 5 seconds on my machine (when compiled with -O2) whereas my Java-Implementation finishes in about 0.48 seconds. Profiling indicates that the time is mostly spend in System.Random.random and System.Random.randomR so I wonder if these are slow or what else might cause the relative slowness of my implementation. Many thanks for your responses, Thoralf PS: I would also appreciate any feedback about the module from a design perspective. I bet I miss lots of good Haskell idioms. ---------------- [1] The alias methods moves probability mass of a non-uniform probability distribution around to create a uniform distribution. Lets say you have three items "a", "b", and "c" with distribution [0.2, 0.1, 0.7] then a uniform distribution would assign 1/3 to each so "a" and "b" need to be "filled up" with exactly the same probability mass that "c" has too much. Then 2 uniform random numbers are generated. The first one to pick an item and the second one to pick either the item itself if the value is in the original part or the alias otherwise. A much better explanation can be found on the web somewhere. Anyway it should not matter with regards to the performance problem I have.
module Main where import Random import System.Random import Data.Array import Data.List main = do g <- getStdGen let k = ["a", "b", "c"] n = 1000000 :: Int t = setup k [0.2, 0.1, 0.7] :: (Array Int String, Array Int String, Array Int Double) print $ foldl' (\a b -> if b == "b" then a + 1 else a) 0 (take n $ randomList t g) / fromIntegral n
module Random (setup, next, randomList) where import System.Random hiding (next) import Data.Array.IArray -- Given a list of items and a list of their propabilities generate a tripel -- consisting of the values vector, the alias vector and the relative propabilities -- vector which is used in applications of next, etc. setup :: (Ord a, IArray a2 a, IArray a1 e, Num a) => [e] -> [a] -> (a1 Int e, a1 Int e, a2 Int a) setup xs ps = let n = length ps xv = listArray (0, n - 1) xs rv = listArray (0, n - 1) [fromIntegral n * p | p <- ps] (low, high) = splitAt 1 rv (indices rv) [] [] (a, r) = calcAlias xv xv rv high low in (xv, a, r) where -- Return a pair of lists, the first consisting of elements lower than -- given threshold and the second with elements greater than threshold, -- equal elements are ignored. splitAt t v [] l h = (l, h) splitAt t v (i:is) l h = case v!i of x | x < t -> splitAt t v is (i:l) h | x > t -> splitAt t v is l (i:h) | otherwise -> splitAt t v is l h -- Given an list of highs and a list of lows, calculate the alias vector and the relative -- propabilities vector. calcAlias :: (Ord e, Num e, IArray a e, Ix i, IArray a2 e1, IArray a1 e1) => a2 i e1 -> a1 i e1 -> a i e -> [i] -> [i] -> (a1 i e1, a i e) calcAlias xv av rv [] _ = (av, rv) calcAlias xv av rv _ [] = (av, rv) calcAlias xv av rv hi@(h:hs) (l:ls) = let av' = av//[(l, xv!h)] rv' = rv//[(h, rv!h + rv!l - 1)] in if rv'!h >= 1 then calcAlias xv av' rv' hi ls else calcAlias xv av' rv' hs (h:ls) -- Generate a random item according to the given propability distribution as specified -- by the given tripel which is the result of applying "setup" to a list of items and -- a list of propabilities. next :: (IArray a2 e1, IArray a e1, Ord e, IArray a1 e, RandomGen t, Random e) => (a Int e1, a2 Int e1, a1 Int e) -> t -> (e1, t) next (xs, as, rs) g = let n = length $ indices xs (x1, g1) = randomR (0, n - 1) g (x2, g2) = random g1 r = rs!x1 in if x2 <= r then (xs!x1, g2) else (as!x1, g2) -- Generate a infinite list of random items according to the specified propability -- distribution as given by the triple that results from applying setup to a pair -- of a list of items and a list of propabilities (see "setup" for details). randomList :: (Random e, RandomGen t1, IArray a2 e, Ord e, IArray a t, IArray a1 t) => (a Int t, a1 Int t, a2 Int e) -> t1 -> [t] randomList t g = let (n, g') = next t g in n:randomList t g'
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe