>
> We can see two different recursive loops are generated, each with 10
> iterations, one corresponds to "sum" and the other to "length". So
> yes, "h x" is independently evaluated in both cases.


So if `h x` is `map expensive x`, and x is the stream x1,x2,x3, ... then
we'll compute (expensive x1) twice and similarly (expensive x2).

GHC is usually pathological averse to duplicating expensive computations
like this; I'm not sure how you torture it into doing so!

Simon

On Thu, 11 Dec 2025 at 11:28, Harendra Kumar <[email protected]>
wrote:

> On Thu, 11 Dec 2025 at 14:39, Simon Peyton Jones
> <[email protected]> wrote:
> > But still, I think the key features of stream fusion and foldr/build
> fusion are the same: we must do enough inlining.   INLINE pragmas may help;
> but join points are a problem because they are introduced by the compiler.
>
> I agree inlining is key in both cases, inlining the right functions
> and no more is the goal.
>
> > Your plugin uses a criterion like "if the function consumes or returns a
> fusible type, inline it", although I'm terribly hazy about the precise
> specification.
>
> That is correct at high level, rest are details. Our ultimate goal is
> to not have the fusible constructors at all in the final core.
>
> > That same plugin might well be useful for other RULE-based libraries
> too.  E.g. whether `map` returns explicit constructors (like (:) and []) or
> pseudo-constructors like (build g) doesn't matter.  What matters is that
> the function is inlined.
>
> That is possible. The first step would be to detect such cases and
> warn the user about presence of fusible constructors and therefore a
> missed fusion opportunity.
>
> > Two other thoughts
> >
> > First: inlining join points aggressively can cause exponential code
> blow-up.  Consider
> >   join j1 x = blah
> >   join j2 x = case x of { Left x1 -> j1 x1; Right x2 -> j1 (x2+1) }
> >   join j3 x = case x of { Left x1 -> j2 x1; Right x2 -> j2 (x2+1) }
> >   join j4 x = case x of { Left x1 -> j3 x1; Right x2 -> j3 (x2+1) }
> >   in ...
> > The types aren't right here but you'll get the idea.  If you inline all
> the join points, you unravel a DAG into a tree of exponential size.
> >
> > I'm not sure how you guard against that.
>
> One way is to put an upper limit on the resulting function is -- if it
> exceeds the limit then do not inline, instead warn the user that
> fusion would result in a large intermediate function. The user can
> still force it if they know what they are doing, by relaxing the
> limit.
>
> > Second.  Inlining join points may not be necessary to achieve fusion.
> Section 5 of "Compiling without continuations" gives the idea.    It might
> be worth understanding carefully why this doesn't work in your application.
>
> That's a bit surprising to me that it's possible. I will try to understand
> it.
>
> > Third:
> >>
> >> If someone does write this code, in the case of stream fusion
> >> both f and g will get inlined and fuse with "ys", so fusion is not an
> >> issue. However, "ys" gets duplicated
> >
> >
> > In the example  ys = h x
> > So are you saying that (h x) is computed twice?  That would cause a
> perhaps-asymptotic increase in compile time!  Is that what you get?  You
> could say "don't do that" but it's hard to be completely sure you haven't.
>
> Yes, "h x" is a stream and both f and g are independently consuming
> and fusing it, generating two different fused loops end to end.
>
> For monadic streams in streamly the equivalent code looks like this:
>
> module Example (f) where
>
> import qualified Streamly.Data.Stream as Stream
> import qualified Streamly.Data.Fold as Fold
>
> f :: IO Int
> f = do
>     let ys = Stream.enumerateFromTo 1 (10 :: Int)
>     x <- Stream.fold Fold.sum ys
>     y <- Stream.fold Fold.length ys
>     return (x + y)
>
> And the fused core for the above code looks like this:
>
> Rec {
> f_$s$wgo
>   = \ sc_s2KE sc1_s2KF ww_s2IJ eta_s2IL ->
>       case <=# ww_s2IJ 10# of {
>         __DEFAULT ->
>           joinrec {
>             $s$wgo_s2KM sc2_s2KL sc3_s2KK sc4_s2KJ
>               = case <=# sc3_s2KK 10# of {
>                   __DEFAULT -> (# sc2_s2KL, I# (+# sc_s2KE sc4_s2KJ) #);
>                   1# -> jump $s$wgo_s2KM sc2_s2KL (+# sc3_s2KK 1#) (+#
> sc4_s2KJ 1#)
>                 }; } in
>           jump $s$wgo_s2KM eta_s2IL 1# 0#;
>         1# ->
>           let { incr_s1Xd = -# ww_s2IJ sc1_s2KF } in
>           let { total1_s1Xe = +# sc_s2KE incr_s1Xd } in
>           f_$s$wgo
>             total1_s1Xe
>             (-# (-# total1_s1Xe sc_s2KE) incr_s1Xd)
>             (+# ww_s2IJ 1#)
>             eta_s2IL
>       }
> end Rec }
>
> f1 = \ s_a1u0 -> f_$s$wgo 0# 0# 1# s_a1u0
>
> f = f1 `cast` <Co:3> :: ...
>
> We can see two different recursive loops are generated, each with 10
> iterations, one corresponds to "sum" and the other to "length". So
> yes, "h x" is independently evaluated in both cases.
>
> -harendra
>
_______________________________________________
ghc-devs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to