On 03/19/2013 10:32 PM, Don Stewart wrote:
Oh, I forgot the technique of inlining the lazy bytestring chunks, and
processing each chunk seperately.
$ time ./fast
4166680
./fast 1.25s user 0.07s system 99% cpu 1.325 total
Essentially inline Lazy.foldlChunks and specializes is (the inliner
should really get that).
And now we have a nice unboxed inner loop, which llvm might spot:
$ ghc -O2 -funbox-strict-fields fast.hs --make -fllvm
$ time ./fast
4166680
./fast 1.07s user 0.06s system 98% cpu *1.146 total*
So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :)
Thanks Don, but after some investigation I came to conclusion that
problem is in State monad
{-# LANGUAGE BangPatterns #-}
import Control.Monad.State.Strict
data S6 = S6 !Int !Int
main_6 = do
let r = evalState go (S6 10000 0)
print r
where
go = do
(S6 i a) <- get
if (i == 0) then return a else (put (S6 (i - 1) (a + i))) >> go
main_7 = do
let r = go (S6 10000 0)
print r
where
go (S6 i a)
| i == 0 = a
| otherwise = go $ S6 (i - 1) (a + i)
main = main_6
main_6 doing constant allocations while main_7 run in constant space.
Can you suggest something that improve situation? I don't want to
manually unfold all my code that I want to be fast :(.
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe