At this point, given that lazier random generation seems to be better, using unboxed types for this seems a losing idea, as they'd force strictness all over again, so that's not worth trying to salvage.
I'm still curious if the pre-calculation of partial sums that I did works well across processors, as I don't see why it shouldn't. My less-strictified version of Don's code is attached, and below are the functions you'll need to insert/replace to make the partial-sums optimization work.
Regards, SterlP.S., if you're running on a unix, I find it much more convenient to use the time program rather than rolling timing into my own code. I tested this program using, for example > time ./fastaRefUnStrict 250000 | tail
fastaRefUnStrict.hs
Description: Binary data
--
Code for partial sums:
choose :: [(Word8,Float)] -> Float -> Word8
choose [(b,_)] _ = b
choose ((b,f):xs) p = if p < f then b else choose xs p
makeCumul :: [(Word8,Float)]->[(Word8,Float)]
makeCumul freqMap = tail . reverse . foldl' fm [(undefined,0)] $ freqMap
where fm acc@((_,ct):rst) (w,f) = (w,ct + f) : acc
iubs :: [(Word8,Float)]
iubs = makeCumul $ map (first c2w)
[('a',0.27),('c',0.12),('g',0.12),('t',0.27),('B',0.02)
,('D',0.02),('H',0.02),('K',0.02),('M',0.02),('N',0.02)
,('R',0.02),('S',0.02),('V',0.02),('W',0.02),('Y',0.02)]
homs :: [(Word8,Float)]
homs = makeCumul $ map (first c2w)
[('a',0.3029549426680),('c',0.1979883004921)
,('g',0.1975473066391),('t',0.3015094502008)]
On Nov 27, 2007, at 2:09 PM, Richard Kelsall wrote:
Simon Peyton-Jones wrote:| Something I found with Dons version on my machine was that if I removed | all the exclamation marks and the -fbang-patterns bit at the top it went | about 20% faster as well as being much cleaner code, but with my very | rudimentary understanding of Haskell I wasn't entirely sure it would | produce the same results if I did this and didn't get round to checking. If, after investigation (and perhaps checking with Don) you find that adding bangs makes your program go slower, even though the function is in fact strict (otherwise it might go slower because it's just doing more work!) then I'd love to see a test case.Sorry, I don't understand the code, I've jumped in the deep-end beforelearning to swim, but I can now tell you it's producing the same results when I remove some of the exclamation marks. I've checked with an MD5 onthe output. The timings in seconds for 10,000,000 iterations averaged over 5 runs. (There was quite a bit of variation.) Compiled with GHC 6.6.1. (I got stuck compiling it under 6.8) The fancy compile options are from the shootout page. Dons original program 13.26 compiled ghc --make Dons original program 12.54 compiled with -O -fglasgow-exts -optc-mfpmath=sse -optc-msse2 -optc-march=pentium4 Removed 3 bangs from rand 11.47 compiled ghc --make Removed 3 bangs from rand 11.57 compiled with -O -fglasgow-exts -optc-mfpmath=sse -optc-msse2 -optc-march=pentium4 The code below is Dons program fromhttp://shootout.alioth.debian.org/gp4/benchmark.php? test=fasta&lang=ghc&id=0with a timing function added by me. The rand function is where I removed three exclamation marks to make the program faster. Previously I removeddifferent combinations of bangs. Some bangs seem to make it faster and some seem to make it slower. Richard. ------------------------------------------------------------------{-# OPTIONS -O2 -optc-O2 -optc-ffast-math -fbang-patterns -fexcess- precision #-}-- -- The Computer Language Benchmarks Game -- http://shootout.alioth.debian.org/ -- -- Contributed by Don Stewart -- A lazy bytestring solution. -- -- Add: -- -optc-mfpmath=sse -optc-msse2 -- import System import Data.Word import Control.Arrow import Text.Printf -- RK added. import System.CPUTime -- RK added. import qualified Data.ByteString.Lazy as L import qualified Data.ByteString.Lazy.Char8 as C (pack,unfoldr) import qualified Data.ByteString as S import Data.ByteString.Base -- RK added this time function. time :: IO t -> IO t time a = do start <- getCPUTime v <- a end <- getCPUTime let diff = (fromIntegral (end - start)) / (10 ^12) printf "Calc time %0.3f \n" (diff :: Double) return v main = do -- RK modified main to time the computation. time $ comp -- RK mod. comp :: IO Int -- RK mod.comp = do -- RK mod. This was Dons main. I just renamed to comp.n <- getArgs >>= readIO . head writeFasta "ONE" "Homo sapiens alu" (n*2) (L.cycle alu) g <- unfold "TWO" "IUB ambiguity codes" (n*3) (look iubs) 42 unfold "THREE" "Homo sapiens frequency" (n*5) (look homs) g---------------------------------------------------------------------- ---- -- lazily unfold the randomised dna sequences -- unfold l t n f !g = putStrLn (">" ++ l ++ " " ++ t) >> unroll f g n unroll :: (Int -> (Word8, Int)) -> Int -> Int -> IO Int unroll f = loop where loop r 0 = return r loop !r !i = case S.unfoldrN m (Just . f) r of (!s, Just r') -> do S.putStrLn s loop r' (i-m) where m = min i 60 look ds !k = let (d,j) = rand k in (choose ds d, j) choose :: [(Word8,Float)] -> Float -> Word8 choose [(b,_)] _ = b choose ((!b,!f):xs) !p = if p < f then b else choose xs (p-f)---------------------------------------------------------------------- ---- -- only demand as much of the infinite sequence as we require writeFasta label title n s = do putStrLn $ ">" ++ label ++ " " ++ title let (t:ts) = L.toChunks s go ts t n where go ss !s !n| l60 && n60 = S.putStrLn l >> go ss r (n-60) | n60 = S.putStr s >> S.putStrLn a >> go (tail ss) b (n-60)| n <= ln = S.putStrLn (S.take n s)| otherwise = S.putStr s >> S.putStrLn (S.take (n-ln) (head ss))where !ln = S.length s !l60 = ln >= 60 !n60 = n >= 60 (l,r) = S.splitAt 60 s (a,b) = S.splitAt (60-ln) (head ss)---------------------------------------------------------------------- --im = 139968 ia = 3877 ic = 29573 rand :: Int -> (Float, Int) rand seed = (newran,newseed) -- RK modified. Was !seed where newseed = (seed * ia + ic) `rem` im -- RK mod. Was !newseednewran = 1.0 * fromIntegral newseed / imd -- RK. Was ! newranimd = fromIntegral im---------------------------------------------------------------------- --alu = C.pack "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG\ \GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA\ \CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT\ \ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA\ \GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG\ \AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC\ \AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA" iubs = map (c2w *** id) [('a',0.27),('c',0.12),('g',0.12),('t',0.27),('B',0.02) ,('D',0.02),('H',0.02),('K',0.02),('M',0.02),('N',0.02) ,('R',0.02),('S',0.02),('V',0.02),('W',0.02),('Y',0.02)] homs = map (c2w *** id) [('a',0.3029549426680),('c',0.1979883004921) ,('g',0.1975473066391),('t',0.3015094502008)] _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
