Hello, I am sure folks are aware of this, but since it has not been mentioned another example of useful fusion is `to . from` from `Generics`, where we'd like the representation types to never exist at runtime. I think conceptually the problem is similar, but it's a bit interesting in that it has a different flavor to streams, and also there's a type family involved. Cheers, Iavor
On Tue, Dec 9, 2025 at 5:27 AM Sebastian Graf <[email protected]> wrote: > (Perhaps I should have said "returning `Step`" instead of "involving > `Step`".) > > Am Di., 9. Dez. 2025 um 14:26 Uhr schrieb Sebastian Graf < > [email protected]>: > >> I agree that multi-stage programming is the principled green field >> solution. It provides the right model in which to think about solutions for >> GHC. >> For GHC, that model certainly means "Step is just an intermediate data >> structure", i.e., any computation involving `Step` should only live at >> compile-time ("template polymorphism") and it should be inlined/specialised >> for aggressively. >> >> This is not too different for the situation with list fusion. Any >> compiled program that *runs* `map`/`filter`/`foldr`/`build` has a >> performance bug. You always want to end up with `foldr`'s SAT'd >> unfolding/explicit `(:)`/`[]` instead of a higher-order function `build` >> which does the same thing. That is achieved by the intricate multiphase >> rewrite rule framework. >> If all goes well, the final program never contains any lists whatsoever; >> only a nesting of tight recursive loops. Note that there is no code sharing >> of `map`/`filter`/`foldr` here; they must all have been inlined and >> rewritten and were purely a compile-time thing. >> >> We want the same for stream fusion because it has better fusion >> support for `zipWith`. Inlining any function involving `Step` is absolutely >> the right thing to do! Aggressive specialization as well, but that's really >> difficult to do using SpecConstr because it blows up so easily. The stream >> fusion paper >> <https://www.cs.tufts.edu/~nr/cs257/archive/duncan-coutts/stream-fusion.pdf> >> instead suggests applying the static argument transformation in section >> 7.2. A draft patch of mine tried to do that on the fly to compute >> unfoldings for recursive functions with static args: >> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4553. It lacks >> polishing and my time and interest, but I firmly believe this should help >> the stream fusion use case, because SAT+inlining is *exactly* >> specialization of recursive functions for static arguments. >> (Side Note: Andrasz pointed out to me two years ago that there were >> further complications which I don't recall and which mean that he settled >> on the practical compromise of going for foldr/build fusion and enabling >> zip fusion/parallel for with stream fusion.) >> >> --- >> >> > what happens if you have sharing >> > let ys :: Stream Int = h x >> > in f ys + g ys >> >> This is a flaw in the way that GHC says "lists are streams". IMO a stream >> (such as in streamly) should happily duplicate all the work that goes into >> computing the elements of `ys`, since it is just code. By contrast, data >> structures such as lists do not have that luxury. `let ys :: [Int] = ...; >> in f ys ++ g ys` should not try to fuse `ys`. >> Here's a repo that implements just foldr/build fusion without rewriting >> lists to/from it: https://github.com/sgraf812/foldr-build. It doesn't >> have the "fusion duplicates work" flaw because there is no data that should >> have shared the work in the first place. >> >> --- >> >> *TLDR;* inlining and specialising `Step` sounds like exactly what is >> needed. Specialisation of concatMap is the hard part for stream fusion, >> though, and that means we'll end up with `Step` in the program nonetheless. >> >> Btw., Lean's standard `Iterator` framework >> <https://github.com/leanprover/lean4/blob/6e711bf067a0d9dc6623996b4bd3757ac6c46886/src/Init/Data/Iterators/Basic.lean#L356> >> (inspired by Rust's >> <https://doc.rust-lang.org/std/iter/trait.Iterator.html>) will implement >> "stream fusion". (That's what external iteration is.) I'm not saying that >> it's simple or that the optimiser is equipped to handle it, though :) >> >> Cheers, >> Sebastian >> >> >> Am Di., 9. Dez. 2025 um 13:45 Uhr schrieb Simon Peyton Jones < >> [email protected]>: >> >>> I think a solution based on multi-stage programming is the most >>>> natural way to tackle the stream fusion problem in a robust manner. >>> >>> >>> How close are we to being able to do that in Haskell? >>> >>> Simon >>> >>> On Tue, 9 Dec 2025 at 09:56, Matthew Pickering < >>> [email protected]> wrote: >>> >>>> I think a solution based on multi-stage programming is the most >>>> natural way to tackle the stream fusion problem in a robust manner. >>>> >>>> For example: https://andraskovacs.github.io/pdfs/2ltt_icfp24.pdf >>>> >>>> By allowing a programmer to express which parts of the program should >>>> only exist code generation time, you guarantee that fusion occurs. >>>> This seems much more aligned with how Haskell programmers think about >>>> other things. >>>> >>>> Cheers, >>>> >>>> Matt >>>> >>>> On Tue, Dec 9, 2025 at 12:03 AM Harendra Kumar < >>>> [email protected]> wrote: >>>> > >>>> > On Mon, 8 Dec 2025 at 23:35, Jaro Reinders <[email protected]> >>>> wrote: >>>> > > >>>> > > I suggested we could integrate this into GHC by giving an inlining >>>> discount to >>>> > > any function which contains function that is also mentioned in a >>>> rewrite rule. >>>> > >>>> > There are two distinct fusion mechanisms under consideration here - >>>> > foldr/build fusion and stream fusion. Rewrite rules allow us to >>>> > replace a combination of functions with an equivalent function, >>>> > forming the basis of foldr/build fusion by fusing or eliminating >>>> > intermediate functions. Stream fusion on the other hand eliminates >>>> > intermediate constructors, the key optimizations for achieving this >>>> > are case-of-case and worker-wrapper. Both types of fusion require >>>> > inlining but they are clearly different at the implementation level, >>>> > even though the same high level ideas may apply in both cases. >>>> > >>>> > The fusion-plugin deals with stream fusion only -- where inlining and >>>> > spec-constr are used to expose constructors which are then eliminated >>>> > by case-of-case (for linear pipelines) or worker wrapper (for >>>> > recursive loops) transformations. fusion-plugin has been used >>>> > extensively in streamly and we have been able to fuse almost all cases >>>> > that we wanted to. I would recommend that you try out fusion-plugin on >>>> > the vector package, it can help fuse the pesky cases and you can also >>>> > gain some insight into how much code bloat it actually causes in >>>> > practice. Vector is primarily based on stream fusion, and even though >>>> > it uses rewrite rules, those are not for fusion as such I believe. >>>> > >>>> > > However, I also noted I think this will have a large impact on code >>>> size and >>>> > > compilation time in larger code bases. Simon agreed, but noted that >>>> Harendra's >>>> > > plugin is much more selective than what I suggested. >>>> > >>>> > Typically we have a pipeline of functions connected by fusible >>>> > constructors. To fuse the entire pipeline completely we need to inline >>>> > all the functions involved in the pipeline. Thus instead of having >>>> > multiple small functions we end up creating one large combined >>>> > function. However, when we use case-of-case transformation on the >>>> > combined function, the size often gets reduced dramatically. In fact >>>> > the total size of the end product is usually smaller than the combined >>>> > original components in the pipeline. However, ghc requires more memory >>>> > and CPU to simplify, especially if the loop becomes too large. >>>> > >>>> > In our experience aggressive inlining for fusing linear segments of >>>> > pipelines even if they are pretty big has almost never been a problem. >>>> > If it becomes large we can always break the pipeline by using a >>>> > NOINLINE, making a trade off between runtime perf and compile time >>>> > resource requirement, but we rarely needed to do that. The code bloat >>>> > comes more often from the spec-constr optimization when we are fusing >>>> > recursive loops. In some cases we have used >>>> > -fspec-constr-recursive=16, which I admit is a very large value, the >>>> > compile times can sometimes increase dramatically due to this. We can >>>> > always choose to use a lower value which may result in unfused code in >>>> > some cases. There may be scope of improvement in GHC for making this >>>> > optimization smarter and more selective. >>>> > >>>> > We can also put some hard thresholds or heuristics here for the cases >>>> > when the resulting monolithic functions become too large. One possible >>>> > way to decide whether to inline or not could be to check if the size >>>> > of the combined/optimized function is smaller than or roughly >>>> > equivalent to the sum of the individual parts. If inlining does not >>>> > result in elimination of the constructors in subsequent passes, then >>>> > the inlining is not useful. But we have seen that it is almost always >>>> > possible to eliminate all or most of the constructors. >>>> > >>>> > > Translating this to the fold/build fusion system in GHC today, we >>>> would force >>>> > > inlining of any function that calls `foldr` or `build` (or >>>> `augment`). >>>> > > Essentially, we would want to inline any binding that refers to a >>>> function that >>>> > > partakes in the rewrite rules for fusion. I'll admit we probably >>>> don't have to >>>> > > generalise this to all rewrite rules, but even with this >>>> restriction it seems >>>> > > like a rather aggressive optimisation strategy to me. >>>> > >>>> > I do not have much experience with using foldr/build. But conceptually >>>> > this is quite similar to what we do in stream fusion and similar ideas >>>> > should be applicable here as well. In this case we inline to expose >>>> > foldr/build primitives and then use rewrite rules to combine those. If >>>> > the inlining does not result in further elimination or firing of >>>> > rewrite rules then the inlining is not useful. You can probably try >>>> > out automatic inlining in the same way as we do in fusion-plugin and >>>> > see if code-bloat/compilation times become a real issue, and in which >>>> > particular cases, then put heuristics/thresholds to avoid that. Is >>>> > there a driving use case for extending this to foldr/build? >>>> > >>>> > -harendra >>>> > _______________________________________________ >>>> > ghc-devs mailing list -- [email protected] >>>> > To unsubscribe send an email to [email protected] >>>> _______________________________________________ >>>> ghc-devs mailing list -- [email protected] >>>> To unsubscribe send an email to [email protected] >>>> >>> _______________________________________________ >>> ghc-devs mailing list -- [email protected] >>> To unsubscribe send an email to [email protected] >>> >> _______________________________________________ > ghc-devs mailing list -- [email protected] > To unsubscribe send an email to [email protected] >
_______________________________________________ ghc-devs mailing list -- [email protected] To unsubscribe send an email to [email protected]
