Hello,
I am sure folks are aware of this, but since it has not been mentioned
another example of useful fusion is `to . from` from `Generics`, where we'd
like the representation types to never exist at runtime.  I think
conceptually the problem is similar, but it's a bit interesting in that it
has a different flavor to streams, and also there's a type family involved.
Cheers,
Iavor

On Tue, Dec 9, 2025 at 5:27 AM Sebastian Graf <[email protected]> wrote:

> (Perhaps I should have said "returning `Step`" instead of "involving
> `Step`".)
>
> Am Di., 9. Dez. 2025 um 14:26 Uhr schrieb Sebastian Graf <
> [email protected]>:
>
>> I agree that multi-stage programming is the principled green field
>> solution. It provides the right model in which to think about solutions for
>> GHC.
>> For GHC, that model certainly means "Step is just an intermediate data
>> structure", i.e., any computation involving `Step` should only live at
>> compile-time ("template polymorphism") and it should be inlined/specialised
>> for aggressively.
>>
>> This is not too different for the situation with list fusion. Any
>> compiled program that *runs* `map`/`filter`/`foldr`/`build` has a
>> performance bug. You always want to end up with `foldr`'s SAT'd
>> unfolding/explicit `(:)`/`[]` instead of a higher-order function `build`
>> which does the same thing. That is achieved by the intricate multiphase
>> rewrite rule framework.
>> If all goes well, the final program never contains any lists whatsoever;
>> only a nesting of tight recursive loops. Note that there is no code sharing
>> of `map`/`filter`/`foldr` here; they must all have been inlined and
>> rewritten and were purely a compile-time thing.
>>
>> We want the same for stream fusion because it has better fusion
>> support for `zipWith`. Inlining any function involving `Step` is absolutely
>> the right thing to do! Aggressive specialization as well, but that's really
>> difficult to do using SpecConstr because it blows up so easily. The stream
>> fusion paper
>> <https://www.cs.tufts.edu/~nr/cs257/archive/duncan-coutts/stream-fusion.pdf>
>> instead suggests applying the static argument transformation in section
>> 7.2. A draft patch of mine tried to do that on the fly to compute
>> unfoldings for recursive functions with static args:
>> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4553. It lacks
>> polishing and my time and interest, but I firmly believe this should help
>> the stream fusion use case, because SAT+inlining is *exactly*
>> specialization of recursive functions for static arguments.
>> (Side Note: Andrasz pointed out to me two years ago that there were
>> further complications which I don't recall and which mean that he settled
>> on the practical compromise of going for foldr/build fusion and enabling
>> zip fusion/parallel for with stream fusion.)
>>
>> ---
>>
>> > what happens if you have sharing
>> >   let ys :: Stream Int = h x
>> >   in f ys + g ys
>>
>> This is a flaw in the way that GHC says "lists are streams". IMO a stream
>> (such as in streamly) should happily duplicate all the work that goes into
>> computing the elements of `ys`, since it is just code. By contrast, data
>> structures such as lists do not have that luxury. `let ys :: [Int] = ...;
>> in f ys ++ g ys` should not try to fuse `ys`.
>> Here's a repo that implements just foldr/build fusion without rewriting
>> lists to/from it: https://github.com/sgraf812/foldr-build. It doesn't
>> have the "fusion duplicates work" flaw because there is no data that should
>> have shared the work in the first place.
>>
>> ---
>>
>> *TLDR;* inlining and specialising `Step` sounds like exactly what is
>> needed. Specialisation of concatMap is the hard part for stream fusion,
>> though, and that means we'll end up with `Step` in the program nonetheless.
>>
>> Btw., Lean's standard `Iterator` framework
>> <https://github.com/leanprover/lean4/blob/6e711bf067a0d9dc6623996b4bd3757ac6c46886/src/Init/Data/Iterators/Basic.lean#L356>
>> (inspired by Rust's
>> <https://doc.rust-lang.org/std/iter/trait.Iterator.html>) will implement
>> "stream fusion". (That's what external iteration is.) I'm not saying that
>> it's simple or that the optimiser is equipped to handle it, though :)
>>
>> Cheers,
>> Sebastian
>>
>>
>> Am Di., 9. Dez. 2025 um 13:45 Uhr schrieb Simon Peyton Jones <
>> [email protected]>:
>>
>>> I think a solution based on multi-stage programming is the most
>>>> natural way to tackle the stream fusion problem in a robust manner.
>>>
>>>
>>> How close are we to being able to do that in Haskell?
>>>
>>> Simon
>>>
>>> On Tue, 9 Dec 2025 at 09:56, Matthew Pickering <
>>> [email protected]> wrote:
>>>
>>>> I think a solution based on multi-stage programming is the most
>>>> natural way to tackle the stream fusion problem in a robust manner.
>>>>
>>>> For example: https://andraskovacs.github.io/pdfs/2ltt_icfp24.pdf
>>>>
>>>> By allowing a programmer to express which parts of the program should
>>>> only exist code generation time, you guarantee that fusion occurs.
>>>> This seems much more aligned with how Haskell programmers think about
>>>> other things.
>>>>
>>>> Cheers,
>>>>
>>>> Matt
>>>>
>>>> On Tue, Dec 9, 2025 at 12:03 AM Harendra Kumar <
>>>> [email protected]> wrote:
>>>> >
>>>> > On Mon, 8 Dec 2025 at 23:35, Jaro Reinders <[email protected]>
>>>> wrote:
>>>> > >
>>>> > > I suggested we could integrate this into GHC by giving an inlining
>>>> discount to
>>>> > > any function which contains function that is also mentioned in a
>>>> rewrite rule.
>>>> >
>>>> > There are two distinct fusion mechanisms under consideration here -
>>>> > foldr/build fusion and stream fusion. Rewrite rules allow us to
>>>> > replace a combination of functions with an equivalent function,
>>>> > forming the basis of foldr/build fusion by fusing or eliminating
>>>> > intermediate functions. Stream fusion on the other hand eliminates
>>>> > intermediate constructors, the key optimizations for achieving this
>>>> > are case-of-case and worker-wrapper. Both types of fusion require
>>>> > inlining but they are clearly different at the implementation level,
>>>> > even though the same high level ideas may apply in both cases.
>>>> >
>>>> > The fusion-plugin deals with stream fusion only -- where inlining and
>>>> > spec-constr are used to expose constructors which are then eliminated
>>>> > by case-of-case (for linear pipelines) or worker wrapper (for
>>>> > recursive loops) transformations. fusion-plugin has been used
>>>> > extensively in streamly and we have been able to fuse almost all cases
>>>> > that we wanted to. I would recommend that you try out fusion-plugin on
>>>> > the vector package, it can help fuse the pesky cases and you can also
>>>> > gain some insight into how much code bloat it actually causes in
>>>> > practice. Vector is primarily based on stream fusion, and even though
>>>> > it uses rewrite rules, those are not for fusion as such I believe.
>>>> >
>>>> > > However, I also noted I think this will have a large impact on code
>>>> size and
>>>> > > compilation time in larger code bases. Simon agreed, but noted that
>>>> Harendra's
>>>> > > plugin is much more selective than what I suggested.
>>>> >
>>>> > Typically we have a pipeline of functions connected by fusible
>>>> > constructors. To fuse the entire pipeline completely we need to inline
>>>> > all the functions involved in the pipeline. Thus instead of having
>>>> > multiple small functions we end up creating one large combined
>>>> > function. However, when we use case-of-case transformation on the
>>>> > combined function, the size often gets reduced dramatically. In fact
>>>> > the total size of the end product is usually smaller than the combined
>>>> > original components in the pipeline. However, ghc requires more memory
>>>> > and CPU to simplify, especially if the loop becomes too large.
>>>> >
>>>> > In our experience aggressive inlining for fusing linear segments of
>>>> > pipelines even if they are pretty big has almost never been a problem.
>>>> > If it becomes large we can always break the pipeline by using a
>>>> > NOINLINE, making a trade off between runtime perf and compile time
>>>> > resource requirement, but we rarely needed to do that. The code bloat
>>>> > comes more often from the spec-constr optimization when we are fusing
>>>> > recursive loops. In some cases we have used
>>>> > -fspec-constr-recursive=16, which I admit is a very large value, the
>>>> > compile times can sometimes increase dramatically due to this. We can
>>>> > always choose to use a lower value which may result in unfused code in
>>>> > some cases. There may be scope of improvement in GHC for making this
>>>> > optimization smarter and more selective.
>>>> >
>>>> > We can also put some hard thresholds or heuristics here for the cases
>>>> > when the resulting monolithic functions become too large. One possible
>>>> > way to decide whether to inline or not could be to check if the size
>>>> > of the combined/optimized function is smaller than or roughly
>>>> > equivalent to the sum of the individual parts. If inlining does not
>>>> > result in elimination of the constructors in subsequent passes, then
>>>> > the inlining is not useful. But we have seen that it is almost always
>>>> > possible to eliminate all or most of the constructors.
>>>> >
>>>> > > Translating this to the fold/build fusion system in GHC today, we
>>>> would force
>>>> > > inlining of any function that calls `foldr` or `build` (or
>>>> `augment`).
>>>> > > Essentially, we would want to inline any binding that refers to a
>>>> function that
>>>> > > partakes in the rewrite rules for fusion. I'll admit we probably
>>>> don't have to
>>>> > > generalise this to all rewrite rules, but even with this
>>>> restriction it seems
>>>> > > like a rather aggressive optimisation strategy to me.
>>>> >
>>>> > I do not have much experience with using foldr/build. But conceptually
>>>> > this is quite similar to what we do in stream fusion and similar ideas
>>>> > should be applicable here as well. In this case we inline to expose
>>>> > foldr/build primitives and then use rewrite rules to combine those. If
>>>> > the inlining does not result in further elimination or firing of
>>>> > rewrite rules then the inlining is not useful. You can probably try
>>>> > out automatic inlining in the same way as we do in fusion-plugin and
>>>> > see if code-bloat/compilation times become a real issue, and in which
>>>> > particular cases, then put heuristics/thresholds to avoid that. Is
>>>> > there a driving use case for extending this to foldr/build?
>>>> >
>>>> > -harendra
>>>> > _______________________________________________
>>>> > ghc-devs mailing list -- [email protected]
>>>> > To unsubscribe send an email to [email protected]
>>>> _______________________________________________
>>>> ghc-devs mailing list -- [email protected]
>>>> To unsubscribe send an email to [email protected]
>>>>
>>> _______________________________________________
>>> ghc-devs mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>>
>> _______________________________________________
> ghc-devs mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
ghc-devs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to