Re: Re[2]: [Haskell-cafe] Fusing foldr's
On 10/29/07, Bulat Ziganshin [EMAIL PROTECTED] wrote: you may also look at these data: 1,225,416 bytes allocated in the heap 152,984 bytes copied during GC (scavenged) 8,448 bytes copied during GC (not scavenged) 86,808 bytes maximum residency (1 sample(s)) 3 collections in generation 0 ( 0.00s) 1 collections in generation 1 ( 0.00s) if your hypothesis is true, amount of data copied and number of generation-1 collection should be much less in the second case Indeed. avg4: 880,935,612 bytes allocated in the heap 319,064,404 bytes copied during GC (scavenged) 318,965,812 bytes copied during GC (not scavenged) 201,080,832 bytes maximum residency (9 sample(s)) 1681 collections in generation 0 ( 1.67s) 9 collections in generation 1 ( 13.62s) avgP: 1,761,224,604 bytes allocated in the heap 714,644 bytes copied during GC (scavenged) 593,184 bytes copied during GC (not scavenged) 184,320 bytes maximum residency (2 sample(s)) 1908 collections in generation 0 ( 0.04s) 2 collections in generation 1 ( 0.00s) Allocation is cheap, copying expensive. All the best, /Josef ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[4]: [Haskell-cafe] Fusing foldr's
Hello Josef, Tuesday, October 30, 2007, 4:13:04 PM, you wrote: 201,080,832 bytes maximum residency (9 sample(s)) 1681 collections in generation 0 ( 1.67s) 9 collections in generation 1 ( 13.62s) 184,320 bytes maximum residency (2 sample(s)) 1908 collections in generation 0 ( 0.04s) 2 collections in generation 1 ( 0.00s) Allocation is cheap, copying expensive. not copying itself, but generation-1 garbage collections. while g-0 collection scans 256kb which lives in CPU cache, g-1 collection scans entire 100-200 mb of data that is very slow. try to use -H1g option, though :) -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Fusing foldr's
On 10/28/07, Isaac Dupree [EMAIL PROTECTED] wrote: Josef Svenningsson wrote: Less bogus timing: avg4: 18.0s avgS: 2.2s avgP: 17.4s OK, so these figures make an even stronger case for my conclusion :-) Single traversal can be much faster than multiple traversals *when done right*. Did you use +RTS -N2 on your program (or whatever it is that makes GHC actually use multiple threads? or is that not necessary?) Anyway I assume you wouldn't get better than 9.0s, which is still much worse than 2.2s. Oh, this is getting embarrassing.. Indeed, I forgot to use +RTS -N2. But using those flags yielded a very interesting result: avgP: 4.3s Superlinear speedup!? As you say, I would have expected something slightly larger than 9s. I think what happens here is that for avg4 the entire list has to be kept in memory between the two traversals whereas for avgP the beginning of the list can be garbage collected incrementally as the two threads traverse it. This could mean that the list never moves to the second generation in the memory manager and that can maybe account for the additional time savings. I'm not sure how to verify that this is the case though. Cheers, /Josef ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Fusing foldr's
On 10/29/07, Josef Svenningsson [EMAIL PROTECTED] wrote: But using those flags yielded a very interesting result: avgP: 4.3s Superlinear speedup!? As you say, I would have expected something slightly larger than 9s. I think what happens here is that for avg4 the entire list has to be kept in memory between the two traversals whereas for avgP the beginning of the list can be garbage collected incrementally as the two threads traverse it. This could mean that the list never moves to the second generation in the memory manager and that can maybe account for the additional time savings. I'm not sure how to verify that this is the case though. Bulat kindly suggested I use +RTS -s to monitor the garbage collectors behavior. It seems my hypothesis was right. avg4: 387 Mb total memory in use MUT time2.43s ( 2.47s elapsed) GCtime 15.32s ( 16.05s elapsed) avgP (+RTS -N2): 3 Mb total memory in use MUT time4.61s ( 2.51s elapsed) GCtime0.04s ( 0.06s elapsed) So it seems that the garbage collector takes an awful lot of time when we allocate a big list like this. Hmmm. Strikes me as somewhat suboptimal. Cheers, /Josef ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Fusing foldr's
Hello Josef, Monday, October 29, 2007, 2:08:54 PM, you wrote: that can maybe account for the additional time savings. I'm not sure how to verify that this is the case though. Bulat kindly suggested I use +RTS -s to monitor the garbage collectors behavior. It seems my hypothesis was right. you may also look at these data: 1,225,416 bytes allocated in the heap 152,984 bytes copied during GC (scavenged) 8,448 bytes copied during GC (not scavenged) 86,808 bytes maximum residency (1 sample(s)) 3 collections in generation 0 ( 0.00s) 1 collections in generation 1 ( 0.00s) if your hypothesis is true, amount of data copied and number of generation-1 collection should be much less in the second case -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Fusing foldr's
On 10/26/07, Dan Weston [EMAIL PROTECTED] wrote: Thanks for letting me know about the Data.Strict library on Hackage. I will definitely make use of that! BTW, you left out an import Data.List(foldl') in your example. Yes, Data.Strict can be pretty handy for getting the right strictness. Sorry about the missing import. My timing test is an order of magnitude worse than yours. Do you have an extra zero in your list endpoint? I fed these functions to ghc with the -O2 and -threaded flags and timed them using the list [1..1000]. The result (best times out of several runs): avg4: 284 ms avgS: 184 ms avgP: 248 ms Using ghc -threaded -O2 --make Avg.hs for ghc 6.6.1, I ran your tests on [1..1000] and got the user times: avg4: 12.75 s avgS: 3.65 s avgP: 15.56 s The funny thing is that avg4/avgS = 3.5 for and only 1.5 for you. I understand that with only 1 processor my avgP time may be twice yours, but not the avgS or avg4. Oooops.. My numbers are totally bogus. I had code that looked like the following: \begin{code} main = do time avg4 [1..1000] time avg4 [1..1000] time avg4 [1..1000] time avgS [1..1000] time avgS [1..1000] time avgS [1..1000] time avgP [1..1000] time avgP [1..1000] time avgP [1..1000] \end{code} Not very elegant I know but I thought it would do the job. Apparently I was wrong. GHC does common subexpression elimination on all the lists so they're all shared between the different calls. Of course, the first function call would always take long time but I ignored it, thinking it was some anomaly. Anyway, I was totally sure that GHC only did cse on constructor expressions and not on arbitrary computations. Guess I was wrong. A little searching revealed the following quote by Simon PJ: GHC does a very simple form of CSE. If it sees let x = e in e it replaces the inner 'e' by x. But that's all at the moment. Lesson learned. Less bogus timing: avg4: 18.0s avgS: 2.2s avgP: 17.4s OK, so these figures make an even stronger case for my conclusion :-) Single traversal can be much faster than multiple traversals *when done right*. I have the following machine: Main memory size: 2026 Mbytes Num Processors: 1 Processor Type: Intel(R) Xeon(TM) CPU 2.80GHz x32 Clock Speed: 2790 MHZ In case you're still interested my machine looks like this: Memory: 2026 Mbytes Processor: AMD Turion(tm) 64 X2 Mobile Technology TL-56 Clock Speed: 1800MHz All the best, /Josef ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Fusing foldr's
Josef Svenningsson wrote: Less bogus timing: avg4: 18.0s avgS: 2.2s avgP: 17.4s OK, so these figures make an even stronger case for my conclusion :-) Single traversal can be much faster than multiple traversals *when done right*. Did you use +RTS -N2 on your program (or whatever it is that makes GHC actually use multiple threads? or is that not necessary?) Anyway I assume you wouldn't get better than 9.0s, which is still much worse than 2.2s. Isaac ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Fusing foldr's
Sorry for reacting so late on this mail. I'm digging through some old mails... On 10/12/07, Dan Weston [EMAIL PROTECTED] wrote: Always check optimizations to make sure they are not pessimizations! Actually, traversing the list twice is very cheap compared to space leakage, and accumulating pairs requires tuple boxing and unboxing which I don't know how to get GHC not to do. I agree hole-heartedly that replacing multiple traversals with a single traversal should be done with care as it more often than not results in a pessimization. Indeed you showed just that with your examples! But I'd thought it'd be interesting to see how it can actually be an improvement if done carefully. \begin{code} import Control.Arrow import qualified Data.Strict.Tuple as T import Data.Strict.Tuple (Pair(..)) import Control.Parallel avg4 = uncurry (/) . (foldl' (+) 0 foldl' (\x y - x + 1) 0) avgS = T.uncurry (/) . foldl' (\p n - ((+n) *!* (+1)) p) (0 :!: 0) avgP = uncurry (/) . (foldl' (+) 0 ! foldl' (\x y - x + 1) 0) (*!*) f g (a :!: b) = f a :!: g b (!) f g a = fa `par` (fa,ga) where fa = f a ga = g a \end{code} avg4 is the function which was best among Dan's benchmarks. avgS uses strict tuples. I just threw in avgP for fun, it traverses the lists in parallel. Note: I do have a dual core machine so it makes sense to try avgP. I fed these functions to ghc with the -O2 and -threaded flags and timed them using the list [1..1000]. The result (best times out of several runs): avg4: 284 ms avgS: 184 ms avgP: 248 ms It seems doing a single traversal can be faster if your write your function carefully. Doing the traversal in parallel was beneficial but not as good as the single traversal. Cheers, /Josef ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Fusing foldr's
Thanks for letting me know about the Data.Strict library on Hackage. I will definitely make use of that! BTW, you left out an import Data.List(foldl') in your example. My timing test is an order of magnitude worse than yours. Do you have an extra zero in your list endpoint? I fed these functions to ghc with the -O2 and -threaded flags and timed them using the list [1..1000]. The result (best times out of several runs): avg4: 284 ms avgS: 184 ms avgP: 248 ms Using ghc -threaded -O2 --make Avg.hs for ghc 6.6.1, I ran your tests on [1..1000] and got the user times: avg4: 12.75 s avgS: 3.65 s avgP: 15.56 s The funny thing is that avg4/avgS = 3.5 for and only 1.5 for you. I understand that with only 1 processor my avgP time may be twice yours, but not the avgS or avg4. I have the following machine: Main memory size: 2026 Mbytes Num Processors: 1 Processor Type: Intel(R) Xeon(TM) CPU 2.80GHz x32 Clock Speed: 2790 MHZ Josef Svenningsson wrote: Sorry for reacting so late on this mail. I'm digging through some old mails... On 10/12/07, Dan Weston [EMAIL PROTECTED] wrote: Always check optimizations to make sure they are not pessimizations! Actually, traversing the list twice is very cheap compared to space leakage, and accumulating pairs requires tuple boxing and unboxing which I don't know how to get GHC not to do. I agree hole-heartedly that replacing multiple traversals with a single traversal should be done with care as it more often than not results in a pessimization. Indeed you showed just that with your examples! But I'd thought it'd be interesting to see how it can actually be an improvement if done carefully. \begin{code} import Control.Arrow import qualified Data.Strict.Tuple as T import Data.Strict.Tuple (Pair(..)) import Control.Parallel avg4 = uncurry (/) . (foldl' (+) 0 foldl' (\x y - x + 1) 0) avgS = T.uncurry (/) . foldl' (\p n - ((+n) *!* (+1)) p) (0 :!: 0) avgP = uncurry (/) . (foldl' (+) 0 ! foldl' (\x y - x + 1) 0) (*!*) f g (a :!: b) = f a :!: g b (!) f g a = fa `par` (fa,ga) where fa = f a ga = g a \end{code} avg4 is the function which was best among Dan's benchmarks. avgS uses strict tuples. I just threw in avgP for fun, it traverses the lists in parallel. Note: I do have a dual core machine so it makes sense to try avgP. I fed these functions to ghc with the -O2 and -threaded flags and timed them using the list [1..1000]. The result (best times out of several runs): avg4: 284 ms avgS: 184 ms avgP: 248 ms It seems doing a single traversal can be faster if your write your function carefully. Doing the traversal in parallel was beneficial but not as good as the single traversal. Cheers, /Josef ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Fusing foldr's
Always check optimizations to make sure they are not pessimizations! Actually, traversing the list twice is very cheap compared to space leakage, and accumulating pairs requires tuple boxing and unboxing which I don't know how to get GHC not to do. Your avg3 (along with several attempts of mine to fix the problem) gave stack overflows on a large list. Only avg4 below (traversing the list twice with strict accumulation) didn't blow up on large lists, even though avg5 and avg6 were intended to be strict. Prelude Control.Arrow Data.List let avg4 = uncurry (/) . (foldl' (+) 0 foldl' (\x y - x + 1) 0) in avg4 [1..1000] 500.5 -- This took 13 sec on my machine Prelude Control.Arrow Data.List let avg3 = uncurry (/) . foldr (\x (s,n) - (s + x,n + 1)) (0,0) in avg3 [1..1000] *** Exception: stack overflow -- This fails in 1 sec Prelude Control.Arrow Data.List let avg5 = uncurry (/) . foldl' (\(s,n) x - (s + x,n + 1)) (0,0) in avg5 [1..1000] *** Exception: stack overflow -- This fails in 100 sec Prelude Control.Arrow Data.List let avg6 = uncurry (/) . foldl' (\sn x - (fst sn+x,snd sn+1)) (0,0) in avg6 [1..1000] *** Exception: stack overflow -- This fails in 30 sec Prelude Control.Arrow Data.List let avg3 = uncurry (/) . foldr (\n - (+n) *** (+1)) (0, 0) in avg3 [1..1000] *** Exception: stack overflow -- This fails in 2 sec Tim Newsham wrote: Just goofing around with arrows and foldr while reading Hutton's excellent paper on folds (http://www.cs.nott.ac.uk/~gmh/fold.pdf). Wondering if this can be done automatically and more generally? module Main where import Control.Arrow import Data.List -- sum and length expressed as foldr. fsum = foldr (\n - (+n)) 0 flen = foldr (\n - (+1)) 0 -- compute average using arrows.. -- compute the sum of a list, compute the length, and do a divide. -- this traverses the list twice using two foldrs. avg1 = uncurry (/) . (fsum flen) avg2 = uncurry (/) . (foldr (\n - (+n)) 0 foldr (\n - (+1)) 0) -- But the two foldr's can be fused together -- here we're mixing the two foldr constants 0 and 0 to (0,0) -- and we're mixing the two functions (\n - (+n)) and -- (\n - (+1)) to (\n - (+n) *** (+1)). avg3 = uncurry (/) . foldr (\n - (+n) *** (+1)) (0, 0) main = do print $ avg1 [1,2,3,4] print $ avg2 [1,2,3,4] print $ avg3 [1,2,3,4] Tim Newsham http://www.thenewsh.com/~newsham/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Fusing foldr's
Just goofing around with arrows and foldr while reading Hutton's excellent paper on folds (http://www.cs.nott.ac.uk/~gmh/fold.pdf). Wondering if this can be done automatically and more generally? module Main where import Control.Arrow import Data.List -- sum and length expressed as foldr. fsum = foldr (\n - (+n)) 0 flen = foldr (\n - (+1)) 0 -- compute average using arrows.. -- compute the sum of a list, compute the length, and do a divide. -- this traverses the list twice using two foldrs. avg1 = uncurry (/) . (fsum flen) avg2 = uncurry (/) . (foldr (\n - (+n)) 0 foldr (\n - (+1)) 0) -- But the two foldr's can be fused together -- here we're mixing the two foldr constants 0 and 0 to (0,0) -- and we're mixing the two functions (\n - (+n)) and -- (\n - (+1)) to (\n - (+n) *** (+1)). avg3 = uncurry (/) . foldr (\n - (+n) *** (+1)) (0, 0) main = do print $ avg1 [1,2,3,4] print $ avg2 [1,2,3,4] print $ avg3 [1,2,3,4] Tim Newsham http://www.thenewsh.com/~newsham/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe