Re: A future for fusion

Simon Peyton Jones Thu, 11 Dec 2025 01:09:57 -0800

>
>  foldr/build relies on identifying the respective
> functions and their rewrite rules, while stream fusion relies on
> recognizing the constructors involved.



Ah, I think this is what I was missing.  Stream fusion does not use RULEs
at all.  (Except for the (stream . unstream = id) rule, which is needed
only in the conversion to/from regular lists, which you probably aboie.)

But still, I think the key features of stream fusion and foldr/build fusion
are the same: *we must do enough inlining*.   INLINE pragmas may help; but
join points are a problem because they are introduced by the compiler.

Your plugin uses a criterion like "if the function consumes or returns a
fusible type, inline it", although I'm terribly hazy about the precise
specification.

That same plugin might well be useful for other RULE-based libraries too.
E.g. whether `map` returns explicit constructors (like (:) and []) or
pseudo-constructors like (build g) doesn't matter.  What matters is that
the function is inlined.

Two other thoughts

First: inlining join points aggressively can cause exponential code
blow-up.  Consider
  join j1 x = blah
  join j2 x = case x of { Left x1 -> j1 x1; Right x2 -> j1 (x2+1) }
  join j3 x = case x of { Left x1 -> j2 x1; Right x2 -> j2 (x2+1) }
  join j4 x = case x of { Left x1 -> j3 x1; Right x2 -> j3 (x2+1) }
  in ...
The types aren't right here but you'll get the idea.  If you inline all the
join points, you unravel a DAG into a tree of exponential size.

I'm not sure how you guard against that.

Second.  Inlining join points may not be necessary to achieve fusion.
Section 5 of "Compiling without continuations
<https://simon.peytonjones.org/compiling-without-continuations/>" gives the
idea.    It might be worth understanding carefully why this doesn't work in
your application.

Third:

> If someone does write this code, in the case of stream fusion
> both f and g will get inlined and fuse with "ys", so fusion is not an
> issue. However, "ys" gets duplicated


In the example  ys = h x
So are you saying that (h x) is computed twice?  That would cause a
perhaps-asymptotic increase in compile time!  Is that what you get?  You
could say "don't do that" but it's hard to be completely sure you haven't.

Simon



On Thu, 11 Dec 2025 at 08:32, Harendra Kumar <[email protected]>
wrote:

> On Tue, 9 Dec 2025 at 15:19, Simon Peyton Jones
> <[email protected]> wrote:
>
> >
> > I'm not really getting this distinction.   foldr/build fusion eliminates
> intermediate lists (cons cells); stream fusion eliminates intermediate Step
> constructors.   What is the difference, really?
>
> My point was that these two fusion approaches have very different
> implementations: foldr/build relies on identifying the respective
> functions and their rewrite rules, while stream fusion relies on
> recognizing the constructors involved. It wasn’t clear which one we
> were discussing. Jaro was coming from the point of view of the vector
> library, where fusion is based on the Step constructor, and then he
> mentioned rewrite rules, which don’t apply to stream fusion. So I’m
> unsure of the goal: is it improving list fusion, or making
> vector-style stream fusion more reliable, possibly by integrating
> fusion-plugin ideas into GHC.
>
> > The only difference in my mind is that you can declare the entire Step
> data type to be fusible, thereby driving aggressive inlining.  Step is
> *only* an intermediate data type.    But we can't do that for lists because
> they are ubiquitous.
>
> In the list fusion case, constructor marking scheme will not apply, in
> this case we will have to identify if a function results in
> foldr/build primitives and if a rewrite rule exists that can eliminate
> it, if so we can mark the function with INLINE. So the corresponding
> concept here would be to mark and find the foldr/build functions
> rather than the list constructors. However, this is already achieved
> by manually marking all such functions to be inlined, though it may be
> fragile. In case of stream-fusion also the programmer marks the
> involved function with INLINE, but the core problem that the
> fusion-plugin solves is that the internal join-points/let bindings
> created by GHC during core-2-core transformations are not marked
> inline, therefore, the programmer loses control over inlining. I am
> not sure if such a problem even exists in case of list fusion.
>
> > BTW Harendra, what happens if you have sharing
> >
> >    let ys :: Stream Int = h x
> >    in f ys + g ys
> >
> > where one Stream is consumed by two consumers, f and g.   Now it's not
> so easy to eliminate the intermediate, and aggressively inlining f, g, and
> h will simply increase code size.
>
> First of all, I would discourage writing code which duplicates the
> stream. The recommended approach is to use fold combinators to combine
> the folds "f" and "g" here, this is achieved by using the "tee" or
> "teeWith" combinators in streamly. The resulting fold consumes a
> single stream and distributes it to both the folds, achieving complete
> fusion. If someone does write this code, in the case of stream fusion
> both f and g will get inlined and fuse with "ys", so fusion is not an
> issue. However, "ys" gets duplicated which has its own ramifications
> in impure code, but that's a different issue altogether.
>
> -harendra
>

_______________________________________________
ghc-devs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Re: A future for fusion

Reply via email to