#2387: Optimizer misses unboxing opportunity
------------------------------------+---------------------------------------
Reporter: dolio | Owner:
Type: bug | Status: new
Priority: normal | Component: Compiler
Version: 6.8.2 | Severity: minor
Keywords: optimizer unbox box | Testcase:
Architecture: x86_64 (amd64) | Os: Linux
------------------------------------+---------------------------------------
In my studying of the fannkuch benchmark, I've discovered (I think)
another missed optimization. A scaled down illustration goes as follows:
{{{
{-# LANGUAGE TypeOperators, BangPatterns #-}
module Main (main) where
import Control.Monad.ST
import System.Environment
data (:*:) a b = !a :*: !b
whileLoop :: Int -> ST s Int
whileLoop = go 0
where
go !n k
| k == 0 = return n
| otherwise = go (n+1) (k-1)
{-# INLINE whileLoop #-}
iter :: Int -> Int -> ST s (Bool :*: Int)
iter k n = do
k' <- whileLoop 40 >>= \k' -> return $! max k k'
b <- return (n == 0)
return $! b :*: k'
{-# INLINE iter #-}
mainLoop :: Int -> Int -> ST s Int
mainLoop k n = do
done :*: k' <- iter k n
if done
then return k'
else mainLoop k' (n - 1)
main = print =<< stToIO . mainLoop 0 . read . head =<< getArgs
}}}
If we look at the core for whileLoop's worker, we see:
{{{
$wpoly_go_r1aE :: forall s_aem.
Int# -> Int# -> STRep s_aem Int
$wpoly_go_r1aE =
\ (@ s_aem)
(ww_s18Y :: Int#)
(ww1_s192 :: Int#)
(eta_s19w :: State# s_aem) ->
case ww1_s192 of wild_XF {
__DEFAULT ->
$wpoly_go_r1aE @ s_aem (+# ww_s18Y 1) (-# wild_XF 1) eta_s19w;
0 -> (# eta_s19w, I# ww_s18Y #)
}
}}}
Note, the return type is a boxed Int. The function is only used once, like
so:
{{{
...
case $wpoly_go_r1aE @ s_aem 0 40 w_s19f of wild_aFw { (# new_s_aFB,
r_aFC #) ->
case r_aFC of wild1_aG9 { I# y_aGb ->
case <=# ww_s199 y_aGb of wild2_aGd {
False ->
...
}}}
In other words, go boxes its results at the end of the loop, and the
function that uses it immediately looks inside the box for the value. In
this particular micro-benchmark, the boxed value (wild1_aG9 above) is
actually used in the case where mainLoop returns the boxed value. However,
in the larger benchmark I pulled this from, that is not the case in
several areas (only the unboxed value is used, but it still goes through a
box). Either way, the boxed value is only used at the end of the loop (and
not every time there), and on every other iteration, this results in
superfluous allocation.
I'll attach a manually unboxed version I wrote (it also has iter and max
manually inlined to mainLoop, since that makes it easier to write; the
core for the above shows that they are getting inlined properly, so I
assume that isn't the issue). Using +RTS -sstderr shows that (running 100
million iterations here), the manually unboxed version allocates 50
kilobytes on the heap, and runs in around 15 seconds, whereas the version
above that doesn't get unboxed does 1.6 gigabytes of heap allocation, and
takes 18 seconds (in the larger benchmark, such extra boxing would happen
perhaps 40 million times over the course of the program).
Thanks for your help.
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/2387>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs