Re: [Haskell-cafe] Performance Issue
Richard, I found this very annoying when I first realised GHC doesn't do any CSE, given that the result of pure functions only depend on their parameters. Even though CSE usually sounds good, when you ask, they go and find obscure examples in which it causes great trouble :) I think, there should at least be a compiler flag or something similar to enforce CSE on some structures. But currently, as others pointed out, you need to bind and do your sub expression elimination yourself. Best, On 18 May 2010 17:30, Richard Warburton richard.warbur...@gmail.com wrote: A colleague of mine pointed out that ghc wasn't performing as he expected when optimising some code. I wonder if anyone could offer any insight as to why its not noting this common subexpression: main = print $ newton 4 24 newton a 0 = a newton a n = ((newton a (n-1))^2 + a)/(2*(newton a (n-1))) Compiled with 'ghc -O3 --make perf.hs', results in: real0m5.544s user0m5.492s sys 0m0.008s However if we factor out the repeated call to the newton method: main = print $ newton2 4 24 newton2 a 0 = a newton2 a n = (x^2 + a)/(2*x) where x = newton2 a (n-1) real0m0.004s user0m0.004s sys 0m0.004s It looks to me like Referential transparency should make this a sound optimisation to apply, but ghc isn't doing it even on -O3. regards, Richard ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe -- Ozgur Akgun ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
Hello, Even though CSE usually sounds good, when you ask, they go and find obscure examples in which it causes great trouble :) Do you (or others) have any particular mean but understandable example? Cheers, Michael ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
Michael Lesniak mlesn...@uni-kassel.de writes: Even though CSE usually sounds good, when you ask, they go and find obscure examples in which it causes great trouble :) Do you (or others) have any particular mean but understandable example? http://www.haskell.org/haskellwiki/GHC:FAQ#Does_GHC_do_common_subexpression_elimination.3F All hail Google! :p -- Ivan Lazar Miljenovic ivan.miljeno...@gmail.com IvanMiljenovic.wordpress.com ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
Hello, http://www.haskell.org/haskellwiki/GHC:FAQ#Does_GHC_do_common_subexpression_elimination.3F All hail Google! :p You're right :D. Thanks! - Michael -- Dipl.-Inf. Michael C. Lesniak University of Kassel Programming Languages / Methodologies Research Group Department of Computer Science and Electrical Engineering Wilhelmshöher Allee 73 34121 Kassel Phone: +49-(0)561-804-6269 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
On Saturday 22 May 2010 15:00:25, Thomas Schilling wrote: Actually, in this case it would be safe to do CSS. Because a) the function is strict in both arguments so GHC creates a worker which only uses unboxed types b) this cannot cause any space leaks (it contains no pointers) The generated Core does look pretty weird, though: $wnewton = \ (ww_s115 :: Double#) (ww1_s119 :: Int#) - case ww1_s119 of ds_Xr8 { __DEFAULT - case ^_r11D (case $wnewton ww_s115 (-# ds_Xr8 1) of ww2_s11d { __DEFAULT - D# ww2_s11d -- box the result of $wnewton }) lvl_r11B of _ { D# x_avk --- unbox it again The boxing is due to the use of (^). If you write x*x instead of x^2, it can use the primop *## and needn't box it. As a side effect, the original time leak probably wouldn't have occured with x*x instead of x^2 because one would've made it let x = newton a (n-1) in (x*x +a) / (2*x) instead of writing out newton a (n-1) thrice anyway, wouldn't one? case $wnewton ww_s115 (-# ds_Xr8 1) of ww2_s11d { __DEFAULT - /## (+## x_avk ww_s115) (*## 2.0 ww2_s11d) } }; 0 - ww_s115 } ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
On Saturday 22 May 2010 16:48:27, Daniel Fischer wrote: The boxing is due to the use of (^). If you write x*x instead of x^2, it can use the primop *## and needn't box it. As a side effect, the original time leak probably wouldn't have occured with x*x instead of x^2 because one would've made it let x = newton a (n-1) in (x*x +a) / (2*x) instead of writing out newton a (n-1) thrice anyway, wouldn't one? Even if. With newton :: Double - Int - Double newton a 0 = a newton a n = (((newton a (n-1)) * (newton a (n-1)) ) + a)/(2*(newton a (n-1))) (and optimisations of course), GHC does share newton a (n-1). Lesson: Writing x^2 is a baad thing. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
On May 22, 2010, at 08:30 , Ozgur Akgun wrote: Even though CSE usually sounds good, when you ask, they go and find obscure examples in which it causes great trouble :) They're not actually that obscure. Most of them can be summarized thusly: anything that would cause a fold to cause a stack or heap overflow if you get the strictness wrong is likely to cause the same problems if CSE optimizes it, for much the same reason. And, worse, if you have CSE, the only way to avoid this would be to disable it; the current situation at least permits you to manually perform CSE via let-binding. (In particular, it's often impossible to manipulate strictness; consider computing an average over a list as sum / length (obscure?): the only way to avoid a large list exploding in memory usage is to pull apart sum and length and thread them together, *no* strictness annotation will help.) -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon universityKF8NH PGP.sig Description: This is a digitally signed message part ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
On 22 May 2010 16:06, Daniel Fischer daniel.is.fisc...@web.de wrote: On Saturday 22 May 2010 16:48:27, Daniel Fischer wrote: The boxing is due to the use of (^). If you write x*x instead of x^2, it can use the primop *## and needn't box it. As a side effect, the original time leak probably wouldn't have occured with x*x instead of x^2 because one would've made it let x = newton a (n-1) in (x*x +a) / (2*x) instead of writing out newton a (n-1) thrice anyway, wouldn't one? Even if. With newton :: Double - Int - Double newton a 0 = a newton a n = (((newton a (n-1)) * (newton a (n-1)) ) + a)/(2*(newton a (n-1))) (and optimisations of course), GHC does share newton a (n-1). Lesson: Writing x^2 is a baad thing. Interesting. Clearly GHC needs a better partial evaluator! :) (^) is not inlined because it's recursive (or rather it's worker is) and there also is no SPECIALISE pragma for Double - Integer - Double. Yes, it's Integer, not Int, because the literal 2 defaults to Integer. It doesn't seem to be possible to add SPECIALISE pragmas for non-local functions. If I copy over the definition of (^) no pragma is needed. GHC creates an worker for Double# - Integer - Double# and that seems to be sufficient to make CSE work. -- Push the envelope. Watch it bend. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Performance Issue
A colleague of mine pointed out that ghc wasn't performing as he expected when optimising some code. I wonder if anyone could offer any insight as to why its not noting this common subexpression: main = print $ newton 4 24 newton a 0 = a newton a n = ((newton a (n-1))^2 + a)/(2*(newton a (n-1))) Compiled with 'ghc -O3 --make perf.hs', results in: real0m5.544s user0m5.492s sys 0m0.008s However if we factor out the repeated call to the newton method: main = print $ newton2 4 24 newton2 a 0 = a newton2 a n = (x^2 + a)/(2*x) where x = newton2 a (n-1) real0m0.004s user0m0.004s sys 0m0.004s It looks to me like Referential transparency should make this a sound optimisation to apply, but ghc isn't doing it even on -O3. regards, Richard ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
On Tue, May 18, 2010 at 9:30 AM, Richard Warburton richard.warbur...@gmail.com wrote: A colleague of mine pointed out that ghc wasn't performing as he expected when optimising some code. I wonder if anyone could offer any insight as to why its not noting this common subexpression: GHC performs almost no common subexpression elimination, the reasons being that it can introduce space leaks and undesired extra laziness. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
A colleague of mine pointed out that ghc wasn't performing as he expected when optimising some code. I wonder if anyone could offer any insight as to why its not noting this common subexpression: GHC performs almost no common subexpression elimination, the reasons being that it can introduce space leaks and undesired extra laziness. Is there any way to encourage it to do so, for example compilation flags? Or is it generally best to hand apply these kind of optimisations. regards, Richard ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
2010/5/18 Richard Warburton richard.warbur...@gmail.com: GHC performs almost no common subexpression elimination, the reasons being that it can introduce space leaks and undesired extra laziness. Is there any way to encourage it to do so, for example compilation flags? Or is it generally best to hand apply these kind of optimisations. I think that handmade common expression elimination improves overall quality of code. ;) (as it amounts to refactoring it) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
On Tue, May 18, 2010 at 9:43 AM, Richard Warburton richard.warbur...@gmail.com wrote: Is there any way to encourage it to do so, for example compilation flags? No. It would be difficult for the compiler to see when CSE is or is not safe to apply, and so it doesn't have any code to perform full CSE at all. Or is it generally best to hand apply these kind of optimisations. Since the compiler won't do it for you, the answer is yes :-) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] performance issue and HGL
Hello, I've created a simple Mandelbrot set generator, using HGL, the source is viewable at: http://pastebin.dqd.cz/cUmg/ 1. The problem is, that it is very slow. It is obvious that what takes the most time is the mandel function computation. I have no idea, how it can be improved. 2. What is the easiest way of changing a color and plotting a pixel with it? Do I always have to create a pen for it? Isn't there a simple way? Already – using a line to plot a pixel is probably not the best way… Thanks___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Performance Issue
i'm implementing a benchmark which includes a detailed specification for a random number generator. for any of the kernels outlined in the benchmark, i might have to generate a set of random numbers R, which has a length n, using the following formulas: R[k] = ((2^-46)(X[k])) mod 2^46, where X[k] = (a^k)s where the values of a and s are constant and defined below. many of the kernels in the benchmark require a large number of randoms to be generated (in the tens of millions). when i invoke the following getRandAt function that many times to build up a list, evaluation of the list takes forever (somewhere between 5 and 10 minutes). i've tried optimizing this several different ways, with no luck. i though i might post my code here and see if anyone notices anything i'm doing wrong that might be causing such a large bottleneck: --constants a :: Int64 a = 5^13 divisor :: Int64 divisor = 2^46 multiplier :: Float multiplier = 2**(-46) --gets r[k], which is the value at the kth --position in the overall sequence of --pseudorandom numbers getRandAt :: Int64 - Int64 - Float getRandAt 0 seed = multiplier * (fromIntegral seed) getRandAt k seed = multiplier * (fromIntegral x_next) where x_prev = (a^k * seed) `mod` divisor x_next = (a * x_prev) `mod` divisor thanks all in advance for your help! -james ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
james.swaine: i'm implementing a benchmark which includes a detailed specification for a random number generator. for any of the kernels outlined in the benchmark, i might have to generate a set of random numbers R, which has a length n, using the following formulas: R[k] = ((2^-46)(X[k])) mod 2^46, where X[k] = (a^k)s where the values of a and s are constant and defined below. many of the kernels in the benchmark require a large number of randoms to be generated (in the tens of millions). when i invoke the following getRandAt function that many times to build up a list, evaluation of the list takes forever (somewhere between 5 and 10 minutes). i've tried optimizing this several different ways, with no luck. i though i might post my code here and see if anyone notices anything i'm doing wrong that might be causing such a large bottleneck: --constants a :: Int64 a = 5^13 divisor :: Int64 divisor = 2^46 multiplier :: Float multiplier = 2**(-46) --gets r[k], which is the value at the kth --position in the overall sequence of --pseudorandom numbers getRandAt :: Int64 - Int64 - Float getRandAt 0 seed = multiplier * (fromIntegral seed) getRandAt k seed = multiplier * (fromIntegral x_next) where x_prev = (a^k * seed) `mod` divisor x_next = (a * x_prev) `mod` divisor thanks all in advance for your help! Using ghc -O2 --make There's nothing wrong with this code, really: Z.$wgetRandAt :: Int# - Int# - Float# and an inner loop of: Z.$w$j :: Int# - Float# Z.$w$j = \ (w_sHx :: Int#) - case Z.^1 Z.lit1 Z.lvl of w1_XFs { I64# ww_XFv - case ww_XFv of wild_aFB { __DEFAULT - case minBound3 of wild1_aFC { I64# b1_aFE - case ==# w_sHx b1_aFE of wild2_aFG { False - case modInt# w_sHx wild_aFB of wild3_aFJ { __DEFAULT - timesFloat# (powerFloat# __float 2.0 __float -46.0) (int2Float# wild3_aFJ) }; True - case wild_aFB of wild3_aFM { __DEFAULT - case modInt# w_sHx wild3_aFM of wild4_aFN { __DEFAULT - timesFloat# (powerFloat# __float 2.0 __float -46.0) (int2Float# wild4_aFN) }; (-1) - overflowError `cast` (CoUnsafe (forall a_aFS. a_aFS) Float# :: forall a_aFS. a_aFS ~ Float#) } } }; 0 - divZeroError `cast` (CoUnsafe (forall a_aFT. a_aFT) Float# :: forall a_aFT. a_aFT ~ Float#) } Which is just fine. Inlining those constants explicitly might be a good idea, then we get an outer loop of: Z.$wgetRandAt :: Int# - Int# - Float# Z.$wgetRandAt = \ (ww_sHG :: Int#) (ww1_sHK :: Int#) - case ww_sHG of wild_B1 { __DEFAULT - case Z.lvl3 of wild1_aEd { I64# x#_aEf - case Z.^ Z.a (I64# wild_B1) of wild2_XFe { I64# x#1_XFh - case Z.lvl1 of w_aE7 { I64# ww2_aE9 - case ww2_aE9 of wild3_aFB { __DEFAULT - case minBound3 of wild11_aFC { I64# b1_aFE - let { ww3_aFz [ALWAYS Just L] :: Int# ww3_aFz = *# x#1_XFh ww1_sHK } in case ==# ww3_aFz b1_aFE of wild21_aFG { False - case modInt# ww3_aFz wild3_aFB of wild31_aFJ { __DEFAULT - Z.$w$j (*# x#_aEf wild31_aFJ) }; True - case wild3_aFB of wild31_aFM { __DEFAULT - case modInt# ww3_aFz wild31_aFM of wild4_aFN { __DEFAULT - Z.$w$j (*# x#_aEf wild4_aFN) }; (-1) - overflowError `cast` (CoUnsafe (forall a_aFS. a_aFS) Float# :: forall a_aFS. a_aFS ~ Float#) } 0 - divZeroError `cast` (CoUnsafe (forall a_aFT. a_aFT) Float# :: forall a_aFT. a_aFT ~ Float#) } } } }; 0 - timesFloat# (powerFloat# __float 2.0 __float -46.0) (int2Float# ww1_sHK) } which is the fast path, then error / bounds checking. this looks perfectly acceptable. What does sound troublesome is using lazy lists .. that's more likely to be the bottleneck. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
2009/2/26 James Swaine james.swa...@gmail.com --gets r[k], which is the value at the kth --position in the overall sequence of --pseudorandom numbers getRandAt :: Int64 - Int64 - Float getRandAt 0 seed = multiplier * (fromIntegral seed) getRandAt k seed = multiplier * (fromIntegral x_next) where x_prev = (a^k * seed) `mod` divisor x_next = (a * x_prev) `mod` divisor One thing that comes to mind is that this exponentiation, with a very big exponent, could potentially take a very long time. I believe that GHC implements (^) using a repeated squaring technique, so it runs in log(k) time, which ought to be no problem. I'm not sure about other compilers though. Also note: (a^k * seed) `mod` divisor = ((a^k `mod` divisor) * seed) `mod` divisor = (a^(k `mod` phi(divisor)) * seed) `mod` divisor. Where phi is the Euler totient function: phi(2^46) = 2^23. Modulo errors... it's been a while since I've done this stuff. Luke ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Performance Issue
Am Donnerstag, 26. Februar 2009 18:48 schrieb Luke Palmer: 2009/2/26 James Swaine james.swa...@gmail.com --gets r[k], which is the value at the kth --position in the overall sequence of --pseudorandom numbers getRandAt :: Int64 - Int64 - Float getRandAt 0 seed = multiplier * (fromIntegral seed) getRandAt k seed = multiplier * (fromIntegral x_next) where x_prev = (a^k * seed) `mod` divisor x_next = (a * x_prev) `mod` divisor One thing that comes to mind is that this exponentiation, with a very big exponent, could potentially take a very long time. I believe that GHC implements (^) using a repeated squaring technique, so it runs in log(k) time, which ought to be no problem. I'm not sure about other compilers though. Another thing: if you don't need to pick random indices, but use them in order, it may be faster to have a list of the random numbers : randInts = iterate ((`mod` divisor) . (*a)) seed or carry x[k] around as state. Also note: (a^k * seed) `mod` divisor = ((a^k `mod` divisor) * seed) `mod` divisor = (a^(k `mod` phi(divisor)) * seed) `mod` divisor. Where phi is the Euler totient function: phi(2^46) = 2^23. phi(2^n) = 2^(n-1). Apart from that, correct. Modulo errors... it's been a while since I've done this stuff. Luke ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe