[commit: ghc] master: Give a unfolding argument discount proportional to the number of available arguments (dfe536b)

Max Bolingbroke Wed, 07 Mar 2012 12:31:02 -0800

Repository : ssh://darcs.haskell.org//srv/darcs/ghc

On branch  : master


http://hackage.haskell.org/trac/ghc/changeset/dfe536be7d5d662ae75671797750b487c1ef59b7

>---------------------------------------------------------------

commit dfe536be7d5d662ae75671797750b487c1ef59b7
Author: Max Bolingbroke <[email protected]>
Date:   Wed Mar 7 19:44:31 2012 +0000

    Give a unfolding argument discount proportional to the number of available 
arguments
    
    Ensures that h1 gets inlined into its use sites in cases like:
    
    """
    h1 k = k undefined undefined undefined
            undefined undefined undefined
            undefined undefined undefined
            undefined undefined undefined
            undefined undefined undefined
            undefined undefined undefined
    
    a = h1 (\x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
    b = h1 (\_ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
    """
    
    I've benchmarked this on nofib (albeit recompiling only the
    benchmarks, not the library) and it hardly shifts the numbers - binary
    size is up by 0.1% at most (average 0.0%) and the worst-case
    allocation increase is 0.2% (best case -0.1%, 0.0% average).
    
    If you also rebuild the libraries with this change, the only further
    change is a +0.2% allocation increase in cacheprof. So this looks like
    a pretty low-risk change that will considerably benefit certain
    programs.

>---------------------------------------------------------------

 compiler/coreSyn/CoreUnfold.lhs |   81 ++++++++++++++++++++++++++++++++++++++-
 1 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/compiler/coreSyn/CoreUnfold.lhs b/compiler/coreSyn/CoreUnfold.lhs
index 96a1abd..64ef6b6 100644
--- a/compiler/coreSyn/CoreUnfold.lhs
+++ b/compiler/coreSyn/CoreUnfold.lhs
@@ -502,6 +502,81 @@ sizeExpr bOMB_OUT_SIZE top_args expr
                                  d2  -- Ignore d1
 \end{code}
 
+Note [Function application discount]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+I noticed that the output of the supercompiler generates a lot of code
+with this form:
+
+"""
+module Inlining where
+
+h1 k = k undefined undefined undefined
+        undefined undefined undefined
+        undefined undefined undefined
+        undefined undefined undefined
+        undefined undefined undefined
+        undefined undefined undefined
+
+a = h1 (\x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
+b = h1 (\_ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
+c = h1 (\_ _ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
+d = h1 (\_ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
+e = h1 (\_ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ _ -> x)
+f = h1 (\_ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ _ -> x)
+g = h1 (\_ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ _ -> x)
+h = h1 (\_ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ _ -> x)
+i = h1 (\_ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ _ -> x)
+j = h1 (\_ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ _ -> x)
+k = h1 (\_ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ _ -> x)
+l = h1 (\_ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ _ -> x)
+m = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ _ -> x)
+n = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ _ -> x)
+o = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ _ -> x)
+p = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ _ -> x)
+q = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x _ -> x)
+r = h1 (\_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ x -> x)
+"""
+
+With GHC head the applications of h1 are not inlined, which hurts the
+quality of the generated code a bit. I was wondering why h1 wasn't
+getting inlined into each of "a" to "i" - after all, it has a manifest
+lambda argument.
+
+It turns out that the code in CoreUnfold gives a fixed discount of
+opt_UF_FunAppDiscount to a function argument such as "k" if it applied
+to any arguments. This is enough to ensure that h1 is inlined if the number
+of arguments applied to k is below a certain limit, but if many arguments are
+applied to k then the fixed discount can't overcome the size of the
+chain of apps, and h1 is never inlined.
+
+My proposed solution is to change CoreUnfold.funSize so that longer
+chains of arguments being applied to a lambda-bound function give a
+bigger discount. The motivation for this is that we would *generally*
+expect that the lambda at the callsite has enough lambdas such that
+all of the applications within the body can be beta-reduced away. This
+change might lead to over eager inlining in cases like this, though:
+
+{{{
+h1 k = k x y z
+
+{-# NOINLINE g #-}
+g = ...
+
+main = ... h1 (\x -> g x) ...
+}}}
+
+In this case we aren't able to beta-reduce away all of the
+applications in the body of h1 because the lambda at the call site
+only binds 1 argument, not the 3 allowed by the type. I don't expect
+this case to be particularly common, however.
+
+I chose the bonus to be (size - 20) so that application to 1 arg got
+same bonus as the old fixed bonus (i.e. opt_UF_FunAppDiscount, which is 60).
+If you have the bonus being (size - 40) then $fMonad[]_$c>>= with interesting
+2nd arg doesn't inline in cryptarithm2 so we lose some deforestation, and
+overall binary size hardly falls.
+
 \begin{code}
 -- | Finds a nominal size of a string literal.
 litSize :: Literal -> Int
@@ -541,13 +616,15 @@ funSize top_args fun n_val_args
   where
     some_val_args = n_val_args > 0
 
+        -- See Note [Function application discount]
     arg_discount | some_val_args && fun `elem` top_args
-                = unitBag (fun, opt_UF_FunAppDiscount)
+                = unitBag (fun, opt_UF_FunAppDiscount + (size - 20))
                 | otherwise = emptyBag
        -- If the function is an argument and is applied
        -- to some values, give it an arg-discount
 
-    res_discount | idArity fun > n_val_args = opt_UF_FunAppDiscount
+        -- See Note [Function application discount]
+    res_discount | idArity fun > n_val_args = opt_UF_FunAppDiscount + (size - 
20)
                 | otherwise                = 0
         -- If the function is partially applied, show a result discount
     size | some_val_args = 10 * (1 + n_val_args)



_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

[commit: ghc] master: Give a unfolding argument discount proportional to the number of available arguments (dfe536b)

Reply via email to