> > We can see two different recursive loops are generated, each with 10 > iterations, one corresponds to "sum" and the other to "length". So > yes, "h x" is independently evaluated in both cases.
So if `h x` is `map expensive x`, and x is the stream x1,x2,x3, ... then we'll compute (expensive x1) twice and similarly (expensive x2). GHC is usually pathological averse to duplicating expensive computations like this; I'm not sure how you torture it into doing so! Simon On Thu, 11 Dec 2025 at 11:28, Harendra Kumar <[email protected]> wrote: > On Thu, 11 Dec 2025 at 14:39, Simon Peyton Jones > <[email protected]> wrote: > > But still, I think the key features of stream fusion and foldr/build > fusion are the same: we must do enough inlining. INLINE pragmas may help; > but join points are a problem because they are introduced by the compiler. > > I agree inlining is key in both cases, inlining the right functions > and no more is the goal. > > > Your plugin uses a criterion like "if the function consumes or returns a > fusible type, inline it", although I'm terribly hazy about the precise > specification. > > That is correct at high level, rest are details. Our ultimate goal is > to not have the fusible constructors at all in the final core. > > > That same plugin might well be useful for other RULE-based libraries > too. E.g. whether `map` returns explicit constructors (like (:) and []) or > pseudo-constructors like (build g) doesn't matter. What matters is that > the function is inlined. > > That is possible. The first step would be to detect such cases and > warn the user about presence of fusible constructors and therefore a > missed fusion opportunity. > > > Two other thoughts > > > > First: inlining join points aggressively can cause exponential code > blow-up. Consider > > join j1 x = blah > > join j2 x = case x of { Left x1 -> j1 x1; Right x2 -> j1 (x2+1) } > > join j3 x = case x of { Left x1 -> j2 x1; Right x2 -> j2 (x2+1) } > > join j4 x = case x of { Left x1 -> j3 x1; Right x2 -> j3 (x2+1) } > > in ... > > The types aren't right here but you'll get the idea. If you inline all > the join points, you unravel a DAG into a tree of exponential size. > > > > I'm not sure how you guard against that. > > One way is to put an upper limit on the resulting function is -- if it > exceeds the limit then do not inline, instead warn the user that > fusion would result in a large intermediate function. The user can > still force it if they know what they are doing, by relaxing the > limit. > > > Second. Inlining join points may not be necessary to achieve fusion. > Section 5 of "Compiling without continuations" gives the idea. It might > be worth understanding carefully why this doesn't work in your application. > > That's a bit surprising to me that it's possible. I will try to understand > it. > > > Third: > >> > >> If someone does write this code, in the case of stream fusion > >> both f and g will get inlined and fuse with "ys", so fusion is not an > >> issue. However, "ys" gets duplicated > > > > > > In the example ys = h x > > So are you saying that (h x) is computed twice? That would cause a > perhaps-asymptotic increase in compile time! Is that what you get? You > could say "don't do that" but it's hard to be completely sure you haven't. > > Yes, "h x" is a stream and both f and g are independently consuming > and fusing it, generating two different fused loops end to end. > > For monadic streams in streamly the equivalent code looks like this: > > module Example (f) where > > import qualified Streamly.Data.Stream as Stream > import qualified Streamly.Data.Fold as Fold > > f :: IO Int > f = do > let ys = Stream.enumerateFromTo 1 (10 :: Int) > x <- Stream.fold Fold.sum ys > y <- Stream.fold Fold.length ys > return (x + y) > > And the fused core for the above code looks like this: > > Rec { > f_$s$wgo > = \ sc_s2KE sc1_s2KF ww_s2IJ eta_s2IL -> > case <=# ww_s2IJ 10# of { > __DEFAULT -> > joinrec { > $s$wgo_s2KM sc2_s2KL sc3_s2KK sc4_s2KJ > = case <=# sc3_s2KK 10# of { > __DEFAULT -> (# sc2_s2KL, I# (+# sc_s2KE sc4_s2KJ) #); > 1# -> jump $s$wgo_s2KM sc2_s2KL (+# sc3_s2KK 1#) (+# > sc4_s2KJ 1#) > }; } in > jump $s$wgo_s2KM eta_s2IL 1# 0#; > 1# -> > let { incr_s1Xd = -# ww_s2IJ sc1_s2KF } in > let { total1_s1Xe = +# sc_s2KE incr_s1Xd } in > f_$s$wgo > total1_s1Xe > (-# (-# total1_s1Xe sc_s2KE) incr_s1Xd) > (+# ww_s2IJ 1#) > eta_s2IL > } > end Rec } > > f1 = \ s_a1u0 -> f_$s$wgo 0# 0# 1# s_a1u0 > > f = f1 `cast` <Co:3> :: ... > > We can see two different recursive loops are generated, each with 10 > iterations, one corresponds to "sum" and the other to "length". So > yes, "h x" is independently evaluated in both cases. > > -harendra >
_______________________________________________ ghc-devs mailing list -- [email protected] To unsubscribe send an email to [email protected]
