Re: A future for fusion

Matthew Pickering Wed, 10 Dec 2025 06:31:59 -0800

When it comes to fusion of `Generics`, we wrote a paper about
performing that fusion in a multi-stage language:
https://mpickering.github.io/papers/staged-sop.pdf


In general there are soundness issues with Typed Template Haskell
which make programming like this brittle in practice, which is what I
discussed in my PhD thesis.

Matt

On Tue, Dec 9, 2025 at 4:35 PM Iavor Diatchki <[email protected]> wrote:
>
> Hello,
> I am sure folks are aware of this, but since it has not been mentioned 
> another example of useful fusion is `to . from` from `Generics`, where we'd 
> like the representation types to never exist at runtime.  I think 
> conceptually the problem is similar, but it's a bit interesting in that it 
> has a different flavor to streams, and also there's a type family involved.
> Cheers,
> Iavor
>
> On Tue, Dec 9, 2025 at 5:27 AM Sebastian Graf <[email protected]> wrote:
>>
>> (Perhaps I should have said "returning `Step`" instead of "involving 
>> `Step`".)
>>
>> Am Di., 9. Dez. 2025 um 14:26 Uhr schrieb Sebastian Graf 
>> <[email protected]>:
>>>
>>> I agree that multi-stage programming is the principled green field 
>>> solution. It provides the right model in which to think about solutions for 
>>> GHC.
>>> For GHC, that model certainly means "Step is just an intermediate data 
>>> structure", i.e., any computation involving `Step` should only live at 
>>> compile-time ("template polymorphism") and it should be inlined/specialised 
>>> for aggressively.
>>>
>>> This is not too different for the situation with list fusion. Any compiled 
>>> program that *runs* `map`/`filter`/`foldr`/`build` has a performance bug. 
>>> You always want to end up with `foldr`'s SAT'd unfolding/explicit 
>>> `(:)`/`[]` instead of a higher-order function `build` which does the same 
>>> thing. That is achieved by the intricate multiphase rewrite rule framework.
>>> If all goes well, the final program never contains any lists whatsoever; 
>>> only a nesting of tight recursive loops. Note that there is no code sharing 
>>> of `map`/`filter`/`foldr` here; they must all have been inlined and 
>>> rewritten and were purely a compile-time thing.
>>>
>>> We want the same for stream fusion because it has better fusion support for 
>>> `zipWith`. Inlining any function involving `Step` is absolutely the right 
>>> thing to do! Aggressive specialization as well, but that's really difficult 
>>> to do using SpecConstr because it blows up so easily. The stream fusion 
>>> paper instead suggests applying the static argument transformation in 
>>> section 7.2. A draft patch of mine tried to do that on the fly to compute 
>>> unfoldings for recursive functions with static args: 
>>> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4553. It lacks 
>>> polishing and my time and interest, but I firmly believe this should help 
>>> the stream fusion use case, because SAT+inlining is *exactly* 
>>> specialization of recursive functions for static arguments.
>>> (Side Note: Andrasz pointed out to me two years ago that there were further 
>>> complications which I don't recall and which mean that he settled on the 
>>> practical compromise of going for foldr/build fusion and enabling zip 
>>> fusion/parallel for with stream fusion.)
>>>
>>> ---
>>>
>>> > what happens if you have sharing
>>> >   let ys :: Stream Int = h x
>>> >   in f ys + g ys
>>>
>>> This is a flaw in the way that GHC says "lists are streams". IMO a stream 
>>> (such as in streamly) should happily duplicate all the work that goes into 
>>> computing the elements of `ys`, since it is just code. By contrast, data 
>>> structures such as lists do not have that luxury. `let ys :: [Int] = ...; 
>>> in f ys ++ g ys` should not try to fuse `ys`.
>>> Here's a repo that implements just foldr/build fusion without rewriting 
>>> lists to/from it: https://github.com/sgraf812/foldr-build. It doesn't have 
>>> the "fusion duplicates work" flaw because there is no data that should have 
>>> shared the work in the first place.
>>>
>>> ---
>>>
>>> TLDR; inlining and specialising `Step` sounds like exactly what is needed. 
>>> Specialisation of concatMap is the hard part for stream fusion, though, and 
>>> that means we'll end up with `Step` in the program nonetheless.
>>>
>>> Btw., Lean's standard `Iterator` framework (inspired by Rust's) will 
>>> implement "stream fusion". (That's what external iteration is.) I'm not 
>>> saying that it's simple or that the optimiser is equipped to handle it, 
>>> though :)
>>>
>>> Cheers,
>>> Sebastian
>>>
>>>
>>> Am Di., 9. Dez. 2025 um 13:45 Uhr schrieb Simon Peyton Jones 
>>> <[email protected]>:
>>>>>
>>>>> I think a solution based on multi-stage programming is the most
>>>>> natural way to tackle the stream fusion problem in a robust manner.
>>>>
>>>>
>>>> How close are we to being able to do that in Haskell?
>>>>
>>>> Simon
>>>>
>>>> On Tue, 9 Dec 2025 at 09:56, Matthew Pickering 
>>>> <[email protected]> wrote:
>>>>>
>>>>> I think a solution based on multi-stage programming is the most
>>>>> natural way to tackle the stream fusion problem in a robust manner.
>>>>>
>>>>> For example: https://andraskovacs.github.io/pdfs/2ltt_icfp24.pdf
>>>>>
>>>>> By allowing a programmer to express which parts of the program should
>>>>> only exist code generation time, you guarantee that fusion occurs.
>>>>> This seems much more aligned with how Haskell programmers think about
>>>>> other things.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Matt
>>>>>
>>>>> On Tue, Dec 9, 2025 at 12:03 AM Harendra Kumar <[email protected]> 
>>>>> wrote:
>>>>> >
>>>>> > On Mon, 8 Dec 2025 at 23:35, Jaro Reinders <[email protected]> 
>>>>> > wrote:
>>>>> > >
>>>>> > > I suggested we could integrate this into GHC by giving an inlining 
>>>>> > > discount to
>>>>> > > any function which contains function that is also mentioned in a 
>>>>> > > rewrite rule.
>>>>> >
>>>>> > There are two distinct fusion mechanisms under consideration here -
>>>>> > foldr/build fusion and stream fusion. Rewrite rules allow us to
>>>>> > replace a combination of functions with an equivalent function,
>>>>> > forming the basis of foldr/build fusion by fusing or eliminating
>>>>> > intermediate functions. Stream fusion on the other hand eliminates
>>>>> > intermediate constructors, the key optimizations for achieving this
>>>>> > are case-of-case and worker-wrapper. Both types of fusion require
>>>>> > inlining but they are clearly different at the implementation level,
>>>>> > even though the same high level ideas may apply in both cases.
>>>>> >
>>>>> > The fusion-plugin deals with stream fusion only -- where inlining and
>>>>> > spec-constr are used to expose constructors which are then eliminated
>>>>> > by case-of-case (for linear pipelines) or worker wrapper (for
>>>>> > recursive loops) transformations. fusion-plugin has been used
>>>>> > extensively in streamly and we have been able to fuse almost all cases
>>>>> > that we wanted to. I would recommend that you try out fusion-plugin on
>>>>> > the vector package, it can help fuse the pesky cases and you can also
>>>>> > gain some insight into how much code bloat it actually causes in
>>>>> > practice. Vector is primarily based on stream fusion, and even though
>>>>> > it uses rewrite rules, those are not for fusion as such I believe.
>>>>> >
>>>>> > > However, I also noted I think this will have a large impact on code 
>>>>> > > size and
>>>>> > > compilation time in larger code bases. Simon agreed, but noted that 
>>>>> > > Harendra's
>>>>> > > plugin is much more selective than what I suggested.
>>>>> >
>>>>> > Typically we have a pipeline of functions connected by fusible
>>>>> > constructors. To fuse the entire pipeline completely we need to inline
>>>>> > all the functions involved in the pipeline. Thus instead of having
>>>>> > multiple small functions we end up creating one large combined
>>>>> > function. However, when we use case-of-case transformation on the
>>>>> > combined function, the size often gets reduced dramatically. In fact
>>>>> > the total size of the end product is usually smaller than the combined
>>>>> > original components in the pipeline. However, ghc requires more memory
>>>>> > and CPU to simplify, especially if the loop becomes too large.
>>>>> >
>>>>> > In our experience aggressive inlining for fusing linear segments of
>>>>> > pipelines even if they are pretty big has almost never been a problem.
>>>>> > If it becomes large we can always break the pipeline by using a
>>>>> > NOINLINE, making a trade off between runtime perf and compile time
>>>>> > resource requirement, but we rarely needed to do that. The code bloat
>>>>> > comes more often from the spec-constr optimization when we are fusing
>>>>> > recursive loops. In some cases we have used
>>>>> > -fspec-constr-recursive=16, which I admit is a very large value, the
>>>>> > compile times can sometimes increase dramatically due to this. We can
>>>>> > always choose to use a lower value which may result in unfused code in
>>>>> > some cases. There may be scope of improvement in GHC for making this
>>>>> > optimization smarter and more selective.
>>>>> >
>>>>> > We can also put some hard thresholds or heuristics here for the cases
>>>>> > when the resulting monolithic functions become too large. One possible
>>>>> > way to decide whether to inline or not could be to check if the size
>>>>> > of the combined/optimized function is smaller than or roughly
>>>>> > equivalent to the sum of the individual parts. If inlining does not
>>>>> > result in elimination of the constructors in subsequent passes, then
>>>>> > the inlining is not useful. But we have seen that it is almost always
>>>>> > possible to eliminate all or most of the constructors.
>>>>> >
>>>>> > > Translating this to the fold/build fusion system in GHC today, we 
>>>>> > > would force
>>>>> > > inlining of any function that calls `foldr` or `build` (or `augment`).
>>>>> > > Essentially, we would want to inline any binding that refers to a 
>>>>> > > function that
>>>>> > > partakes in the rewrite rules for fusion. I'll admit we probably 
>>>>> > > don't have to
>>>>> > > generalise this to all rewrite rules, but even with this restriction 
>>>>> > > it seems
>>>>> > > like a rather aggressive optimisation strategy to me.
>>>>> >
>>>>> > I do not have much experience with using foldr/build. But conceptually
>>>>> > this is quite similar to what we do in stream fusion and similar ideas
>>>>> > should be applicable here as well. In this case we inline to expose
>>>>> > foldr/build primitives and then use rewrite rules to combine those. If
>>>>> > the inlining does not result in further elimination or firing of
>>>>> > rewrite rules then the inlining is not useful. You can probably try
>>>>> > out automatic inlining in the same way as we do in fusion-plugin and
>>>>> > see if code-bloat/compilation times become a real issue, and in which
>>>>> > particular cases, then put heuristics/thresholds to avoid that. Is
>>>>> > there a driving use case for extending this to foldr/build?
>>>>> >
>>>>> > -harendra
>>>>> > _______________________________________________
>>>>> > ghc-devs mailing list -- [email protected]
>>>>> > To unsubscribe send an email to [email protected]
>>>>> _______________________________________________
>>>>> ghc-devs mailing list -- [email protected]
>>>>> To unsubscribe send an email to [email protected]
>>>>
>>>> _______________________________________________
>>>> ghc-devs mailing list -- [email protected]
>>>> To unsubscribe send an email to [email protected]
>>
>> _______________________________________________
>> ghc-devs mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>
> _______________________________________________
> ghc-devs mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ghc-devs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Re: A future for fusion

Reply via email to