[Haskell-cafe] Fusion for fun and profi (Was: newbie optimization question)
rendel: Prabhakar Ragde wrote: divisors i = [j | j-[1..i-1], i `mod` j == 0] main = print [i | i-[1..1], i == sum (divisors i)] Jerzy Karczmarczuk wrote: My point didn't concern that point. Haskell compiler cannot change an algorithm using lists into something which deals with indexable arrays, usually faster. Indexing may be faster than the indirection, and the allocation of memory costs. And there is laziness... This may be true, but it isn't relevant in this case, since the obvious c program doesn't need any arrays, only two loops: for (int i = 1; i = 1; i++) { int sum = 0; for (int j = 1; j i; j++) if (i % j == 0) sum += i; if (sum == i) print(i); } Loops can be expressed with lazy lists in Haskell. Therefore, the presented Haskell program is perfectly equivalent to the obvious c program. So what we would hope is that GHC could transform a set of composed lazy list functions into a doubly nested strict loop in Int#... Let's see if we can get that result from GHC, using a bit of fusion. First, to explain what is happening, let's first replace the `mod` call with `rem`, which is faster, and then desugar the list comprehension and enumerations syntax, to expose the underlying program: default(Int) divisors i = filter (\j - i `rem`j == 0) (enumFromTo 1 (i-1)) main = print $ filter (\i - i == sum (divisors i)) (enumFromTo 1 1) Looking at this we see some good chances for fusion to take place: the enumFromTo should fuse twice with 'filter', using build/foldr fusion. And with stream fusion, the left fold 'sum' should also fuse with pipeline that results from divisors. So my prediction would be that this program would run slightly faster with stream fusion. Let's see... Compiling with -O2 and ghc 6.8 snapshot, with build/foldr fusion, we see two fusion sites, as expected, and a spec-constr of 'sum: $ ghc-6.9.20070916 A.hs -O2 -ddump-simpl-stats RuleFired 1 SPEC Data.List.sum 2 fold/build Good, running this: $ time ./A-stream [6,28,496,8128] ./A-stream 1.29s user 0.02s system 99% cpu 1.319 total Now we can try with stream fusion, using the stream-fusible list library here: http://www.cse.unsw.edu.au/~dons/code/streams/list/ To use these list functions in preference to the default, we have to import: import Prelude hiding (filter,sum,enumFromTo) import Data.List.Stream and since the base library doesn't include stream fusible enumFromTo, we'll have to write our own definition in terms of stream: import Data.Stream (enumFromToInt,unstream) enumFromTo i j = unstream (enumFromToInt i j) Ok, that's easy. Compiling this, we hope to see 3 fusion sites, and all heap-allocated Ints removed: $ ghc-6.9.20070916 A.hs -O2 -ddump-simpl-stats -package list RuleFired 2 filter - fusible 1 sumInt - fusible 1 sum spec Int 3 STREAM stream/unstream fusion Terrific! The 'sum' was specialised to Int, then translated to a stream version, the two filters also were translated, then 3 fusion sites were found and fused. Our program should now run faster: $ time ./A-stream [6,28,496,8128] ./A-stream 1.23s user 0.01s system 99% cpu 1.251 total And so it does, with no list allocated for the sum loop. In fact the entire program reduces to a strict unboxed nested loop: unfold = Int# - [Int] wloop_sum_sV5 :: Int# - Int# - Int# So almost identical types to the C program (bar for the return [Int]). Finally, we can manually translate the C code into a confusing set of nested loops with interleaved IO, main = loop 1 where loop !i | i 1 = return () | otherwise = if i == go i 0 1 then print i loop (i+1) else loop (i+1) go !i !s !j | j = i-1 = if i `rem` j == 0 then go i (s+j) (j+1) else go i s (j+1) | otherwise = s And we get *no speed benefit* at all! time ./A-loop 1.24s user 0.00s system 98% cpu 1.256 total So the lesson is: write in a high level style, and the compiler can do the work for you. Or, GHC is pretty smart on high level code. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Fusion for fun and profi (Was: newbie optimization question)
On Sun, Oct 28, 2007 at 01:25:19PM -0700, Don Stewart wrote: Finally, we can manually translate the C code into a confusing set of nested loops with interleaved IO, main = loop 1 where loop !i | i 1 = return () | otherwise = if i == go i 0 1 then print i loop (i+1) else loop (i+1) go !i !s !j | j = i-1 = if i `rem` j == 0 then go i (s+j) (j+1) else go i s (j+1) | otherwise = s And we get *no speed benefit* at all! time ./A-loop 1.24s user 0.00s system 98% cpu 1.256 total So the lesson is: write in a high level style, and the compiler can do the work for you. Or, GHC is pretty smart on high level code. IO blocks unboxing in GHC. How fast is your mock-C code refactored to do IO outside of the loops only? Stefan signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Fusion for fun and profi (Was: newbie optimization question)
stefanor: On Sun, Oct 28, 2007 at 01:25:19PM -0700, Don Stewart wrote: Finally, we can manually translate the C code into a confusing set of nested loops with interleaved IO, main = loop 1 where loop !i | i 1 = return () | otherwise = if i == go i 0 1 then print i loop (i+1) else loop (i+1) go !i !s !j | j = i-1 = if i `rem` j == 0 then go i (s+j) (j+1) else go i s (j+1) | otherwise = s And we get *no speed benefit* at all! time ./A-loop 1.24s user 0.00s system 98% cpu 1.256 total So the lesson is: write in a high level style, and the compiler can do the work for you. Or, GHC is pretty smart on high level code. IO blocks unboxing in GHC. How fast is your mock-C code refactored to do IO outside of the loops only? It doesn't! The above code yields: Main.$wloop :: GHC.Prim.Int# - GHC.Prim.State# GHC.Prim.RealWorld - (# GHC.Prim.State# GHC.Prim.RealWorld, () #) $wgo_rMK :: GHC.Prim.Int# - GHC.Prim.Int# - GHC.Prim.Int# - GHC.Prim.Int# where $s$wgo_rMI :: GHC.Prim.Int# - GHC.Prim.Int# - GHC.Prim.Int# ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Fusion for fun and profi (Was: newbie optimization question)
On Sun, Oct 28, 2007 at 01:43:07PM -0700, Don Stewart wrote: stefanor: IO blocks unboxing in GHC. How fast is your mock-C code refactored to do IO outside of the loops only? It doesn't! The above code yields: Main.$wloop :: GHC.Prim.Int# - GHC.Prim.State# GHC.Prim.RealWorld - (# GHC.Prim.State# GHC.Prim.RealWorld, () #) $wgo_rMK :: GHC.Prim.Int# - GHC.Prim.Int# - GHC.Prim.Int# - GHC.Prim.Int# where $s$wgo_rMI :: GHC.Prim.Int# - GHC.Prim.Int# - GHC.Prim.Int# Ah, interesting. I was reading something too general from http://hackage.haskell.org/trac/ghc/ticket/1592#comment:5. Stefan signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Fusion for fun and profi (Was: newbie optimization question)
On 10/28/07, Stefan O'Rear [EMAIL PROTECTED] wrote: On Sun, Oct 28, 2007 at 01:43:07PM -0700, Don Stewart wrote: stefanor: IO blocks unboxing in GHC. How fast is your mock-C code refactored to do IO outside of the loops only? It doesn't! The above code yields: Main.$wloop :: GHC.Prim.Int# - GHC.Prim.State# GHC.Prim.RealWorld - (# GHC.Prim.State# GHC.Prim.RealWorld, () #) $wgo_rMK :: GHC.Prim.Int# - GHC.Prim.Int# - GHC.Prim.Int# - GHC.Prim.Int# where $s$wgo_rMI :: GHC.Prim.Int# - GHC.Prim.Int# - GHC.Prim.Int# Ah, interesting. I was reading something too general from http://hackage.haskell.org/trac/ghc/ticket/1592#comment:5. Right, unboxing is successful here because the arguments are marked as strict with (!) annotations. The IO hack described in that bug report would only be a problem if the arguments were not marked strict, and were used in code that has to be executed after an IO call. Although I don't think all of the strictness annotations are even necessary here. In any case, loop must evaluate i before the call to print; there would only be an issue if it called an IO function before doing anything that demands i. Cheers, Tim -- Tim Chevalier * catamorphism.org * Often in error, never in doubt Accordingly, computer scientists commonly choose models which have bottoms, but prefer them topless. -- Davey Priestley, _Introduction to Lattices and Order_ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Fusion for fun and profi (Was: newbie optimization question)
dons: stefanor: On Sun, Oct 28, 2007 at 01:25:19PM -0700, Don Stewart wrote: Finally, we can manually translate the C code into a confusing set of nested loops with interleaved IO, main = loop 1 where loop !i | i 1 = return () | otherwise = if i == go i 0 1 then print i loop (i+1) else loop (i+1) go !i !s !j | j = i-1 = if i `rem` j == 0 then go i (s+j) (j+1) else go i s (j+1) | otherwise = s And we get *no speed benefit* at all! time ./A-loop 1.24s user 0.00s system 98% cpu 1.256 total So the lesson is: write in a high level style, and the compiler can do the work for you. Or, GHC is pretty smart on high level code. Oh, and we can fuse with the print loop too, yielding an entire program of type Int# - IO (), and really no intermediate lists (even for the return list). Again, we need to use the fusible version of mapM_: import Prelude hiding (filter,sum,enumFromTo,mapM_,sequence_,map,foldr) import Data.List.Stream import Data.Stream (enumFromToInt,unstream) enumFromTo i j = unstream (enumFromToInt i j) mapM_ f as = sequence_ (map f as) sequence_ ms = foldr () (return ()) ms -- ^ fuse happily default(Int) main = mapM_ print $ filter (\i - i == sum (divisors i)) (enumFromTo 1 1) divisors i = filter (\j - i `rem`j == 0) (enumFromTo 1 (i-1)) And we see the map and foldr fuse in sequence, which in turn fuses with the filter (and rest of the program): 18 RuleFired 5 STREAM stream/unstream fusion 2 filter - fusible 1 foldr - fusible 1 map - fusible 1 sumInt - fusible The program flattens to a single nested loop, Main.$wloop_foldr :: GHC.Prim.Int# - GHC.Prim.State# GHC.Prim.RealWorld - (# GHC.Prim.State# GHC.Prim.RealWorld, () #) which really is equivalent in terms of control flow and intermediate structures to the C program. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe