#4428: Local functions lose their unfoldings
---------------------------------+------------------------------------------
    Reporter:  rl                |        Owner:                         
        Type:  bug               |       Status:  merge                  
    Priority:  normal            |    Milestone:  7.0.2                  
   Component:  Compiler          |      Version:  7.1                    
    Keywords:                    |     Testcase:                         
   Blockedby:                    |   Difficulty:                         
          Os:  Unknown/Multiple  |     Blocking:                         
Architecture:  Unknown/Multiple  |      Failure:  Runtime performance bug
---------------------------------+------------------------------------------

Comment(by rl):

 Replying to [comment:9 simonpj]:
 >
 > > I was under the impression that GHC would inline into unfoldings as
 long as it didn't affect phasing. Has that changed?
 >
 > Only invisibly.  If you write `INLINE[1]`, then it's OK for GHC to
 optimise the rhs-to-be-inlined with phase=1 (only).  Not phase=2, nor
 phase=0, because both would change the above semantics of INLINE.  But
 while its OK to optimise the rhs-to-be-inlined with phase=1, it's not
 necessary, and it complicated things, so I took it out again.

 The previous behaviour was what I meant with "the rhs it would have right
 before phase 0". So this ''has'' changed. It's a pity, actually, I found
 the previous semantics much cleaner.

 > > Anyway, the biggest problem is that local `INLINE` functions are
 optimised twice (the rhs and the unfolding after it's been inlined) and
 usually the rhs is just thrown away so it's completely wasted work. For
 DPH/vector programs, this leads to significantly longer compile times.
 >
 > That seems to be an inherent problem with keeping a template to inline
 separate from the main body of the function.

 Yes. And as I said, doing so makes sense for global functions but IMO not
 for local ones.

 > Let's review what you are trying to acheive.  In your example, the key
 point is that in compositions of `map` all the `step` functions get fused
 together.
 > {{{
 > mapM f (mapM g (Stream step1 s1 n1)
 > = mapM f (Stream step2 s n)
 >   where
 >     step2 s = ...step1...g...
 > = Stream step3 s n
 >   where
 >     step2 s = ...step1...g...
 >     step3 s = ...step2...g...
 > }}}
 > Now you want `step2` to inline into `step3` so that they can fuse, and
 so that f and g meet up and can fuse. (However you don't want `step3` to
 inline into `(Stream step3 s n)` because `Stream` is a data constructor.)

 A better example is this:

 {{{
 foldlM f z (mapM g (Stream step1 s1 n1))
   = let step2 s = case step1 s of
                     Yield x s' -> Yield (g x) s'
                     Skip s'    -> Skip s'
                     Done       -> Done

         loop x s = case step2 s of
                      Yield y s' -> loop (f x y) s'
                      Skip s'    -> loop x s'
                      Done       -> x
     in loop s1 z
 }}}

 Here, I don't really care (I think...) whether `step1` gets inlined into
 `step2` before phase 0 or not. In fact, I would probably prefer `step1` to
 be inlined. But I absolutely don't want `step2` to be inlined into the
 loop before phase 0 because it might introduce unwanted join points there
 if it was more complex. I could delay inlining `step2` at the call site
 but then I'd still need an `INLINE` pragma on it because it might not get
 inlined at all otherwise.

 > Are you saying that you don't want this fusion of the `step` functions
 to take place until phase 0?  If so, a NOINLINE pragma would do.  You
 don't need to say INLINE, because there's only one occurrence of each
 `step` function anyway.

 NOINLINE doesn't work because `step` might get floated out and then not be
 inlined. That's what prompted the original bug report, in fact. It
 ''must'' have an INLINE pragma.

 > Let's suppose there were multiple occurrences of `step`.  You probably
 don't mean "optimise as if there was no pragma, and then inline whatever
 you have in phase 0".  Reason: suppose f or g are gigantic.  Then the
 `step` functions would be gigantic, so duplicating them willy-nilly would
 make gigantic code.

 I don't understand how not inlining until phase 0 would prevent this from
 happening. I think I do mean "optimise as if there was no pragma, and then
 inline whatever you have in phase 0".

 > What goes wrong if you just don't put a pragma on `step`?

 It gets inlined into the loop which sometimes (rarely, but sometimes)
 leads to significantly worse code.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/4428#comment:10>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to