Repository : ssh://darcs.haskell.org//srv/darcs/ghc On branch : master
http://hackage.haskell.org/trac/ghc/changeset/dfe536be7d5d662ae75671797750b487c1ef59b7 >--------------------------------------------------------------- commit dfe536be7d5d662ae75671797750b487c1ef59b7 Author: Max Bolingbroke <[email protected]> Date: Wed Mar 7 19:44:31 2012 +0000 Give a unfolding argument discount proportional to the number of available arguments Ensures that h1 gets inlined into its use sites in cases like: """ h1 k = k undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined undefined a = h1 (\x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x) b = h1 (\_ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x) """ I've benchmarked this on nofib (albeit recompiling only the benchmarks, not the library) and it hardly shifts the numbers - binary size is up by 0.1% at most (average 0.0%) and the worst-case allocation increase is 0.2% (best case -0.1%, 0.0% average). If you also rebuild the libraries with this change, the only further change is a +0.2% allocation increase in cacheprof. So this looks like a pretty low-risk change that will considerably benefit certain programs. >--------------------------------------------------------------- compiler/coreSyn/CoreUnfold.lhs | 81 ++++++++++++++++++++++++++++++++++++++- 1 files changed, 79 insertions(+), 2 deletions(-) diff --git a/compiler/coreSyn/CoreUnfold.lhs b/compiler/coreSyn/CoreUnfold.lhs index 96a1abd..64ef6b6 100644 --- a/compiler/coreSyn/CoreUnfold.lhs +++ b/compiler/coreSyn/CoreUnfold.lhs @@ -502,6 +502,81 @@ sizeExpr bOMB_OUT_SIZE top_args expr d2 -- Ignore d1 \end{code} +Note [Function application discount] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +I noticed that the output of the supercompiler generates a lot of code +with this form: + +""" +module Inlining where + +h1 k = k undefined undefined undefined + undefined undefined undefined + undefined undefined undefined + undefined undefined undefined + undefined undefined undefined + undefined undefined undefined + +a = h1 (\x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x) +b = h1 (\_ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x) +c = h1 (\_ _ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x) +d = h1 (\_ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x) +e = h1 (\_ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ _ -> x) +f = h1 (\_ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ -> x) +g = h1 (\_ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ -> x) +h = h1 (\_ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ -> x) +i = h1 (\_ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ -> x) +j = h1 (\_ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ -> x) +k = h1 (\_ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ -> x) +l = h1 (\_ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ -> x) +m = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ -> x) +n = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ -> x) +o = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ -> x) +p = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ -> x) +q = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ -> x) +r = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x -> x) +""" + +With GHC head the applications of h1 are not inlined, which hurts the +quality of the generated code a bit. I was wondering why h1 wasn't +getting inlined into each of "a" to "i" - after all, it has a manifest +lambda argument. + +It turns out that the code in CoreUnfold gives a fixed discount of +opt_UF_FunAppDiscount to a function argument such as "k" if it applied +to any arguments. This is enough to ensure that h1 is inlined if the number +of arguments applied to k is below a certain limit, but if many arguments are +applied to k then the fixed discount can't overcome the size of the +chain of apps, and h1 is never inlined. + +My proposed solution is to change CoreUnfold.funSize so that longer +chains of arguments being applied to a lambda-bound function give a +bigger discount. The motivation for this is that we would *generally* +expect that the lambda at the callsite has enough lambdas such that +all of the applications within the body can be beta-reduced away. This +change might lead to over eager inlining in cases like this, though: + +{{{ +h1 k = k x y z + +{-# NOINLINE g #-} +g = ... + +main = ... h1 (\x -> g x) ... +}}} + +In this case we aren't able to beta-reduce away all of the +applications in the body of h1 because the lambda at the call site +only binds 1 argument, not the 3 allowed by the type. I don't expect +this case to be particularly common, however. + +I chose the bonus to be (size - 20) so that application to 1 arg got +same bonus as the old fixed bonus (i.e. opt_UF_FunAppDiscount, which is 60). +If you have the bonus being (size - 40) then $fMonad[]_$c>>= with interesting +2nd arg doesn't inline in cryptarithm2 so we lose some deforestation, and +overall binary size hardly falls. + \begin{code} -- | Finds a nominal size of a string literal. litSize :: Literal -> Int @@ -541,13 +616,15 @@ funSize top_args fun n_val_args where some_val_args = n_val_args > 0 + -- See Note [Function application discount] arg_discount | some_val_args && fun `elem` top_args - = unitBag (fun, opt_UF_FunAppDiscount) + = unitBag (fun, opt_UF_FunAppDiscount + (size - 20)) | otherwise = emptyBag -- If the function is an argument and is applied -- to some values, give it an arg-discount - res_discount | idArity fun > n_val_args = opt_UF_FunAppDiscount + -- See Note [Function application discount] + res_discount | idArity fun > n_val_args = opt_UF_FunAppDiscount + (size - 20) | otherwise = 0 -- If the function is partially applied, show a result discount size | some_val_args = 10 * (1 + n_val_args) _______________________________________________ Cvs-ghc mailing list [email protected] http://www.haskell.org/mailman/listinfo/cvs-ghc
