When it comes to fusion of `Generics`, we wrote a paper about performing that fusion in a multi-stage language: https://mpickering.github.io/papers/staged-sop.pdf
In general there are soundness issues with Typed Template Haskell which make programming like this brittle in practice, which is what I discussed in my PhD thesis. Matt On Tue, Dec 9, 2025 at 4:35 PM Iavor Diatchki <[email protected]> wrote: > > Hello, > I am sure folks are aware of this, but since it has not been mentioned > another example of useful fusion is `to . from` from `Generics`, where we'd > like the representation types to never exist at runtime. I think > conceptually the problem is similar, but it's a bit interesting in that it > has a different flavor to streams, and also there's a type family involved. > Cheers, > Iavor > > On Tue, Dec 9, 2025 at 5:27 AM Sebastian Graf <[email protected]> wrote: >> >> (Perhaps I should have said "returning `Step`" instead of "involving >> `Step`".) >> >> Am Di., 9. Dez. 2025 um 14:26 Uhr schrieb Sebastian Graf >> <[email protected]>: >>> >>> I agree that multi-stage programming is the principled green field >>> solution. It provides the right model in which to think about solutions for >>> GHC. >>> For GHC, that model certainly means "Step is just an intermediate data >>> structure", i.e., any computation involving `Step` should only live at >>> compile-time ("template polymorphism") and it should be inlined/specialised >>> for aggressively. >>> >>> This is not too different for the situation with list fusion. Any compiled >>> program that *runs* `map`/`filter`/`foldr`/`build` has a performance bug. >>> You always want to end up with `foldr`'s SAT'd unfolding/explicit >>> `(:)`/`[]` instead of a higher-order function `build` which does the same >>> thing. That is achieved by the intricate multiphase rewrite rule framework. >>> If all goes well, the final program never contains any lists whatsoever; >>> only a nesting of tight recursive loops. Note that there is no code sharing >>> of `map`/`filter`/`foldr` here; they must all have been inlined and >>> rewritten and were purely a compile-time thing. >>> >>> We want the same for stream fusion because it has better fusion support for >>> `zipWith`. Inlining any function involving `Step` is absolutely the right >>> thing to do! Aggressive specialization as well, but that's really difficult >>> to do using SpecConstr because it blows up so easily. The stream fusion >>> paper instead suggests applying the static argument transformation in >>> section 7.2. A draft patch of mine tried to do that on the fly to compute >>> unfoldings for recursive functions with static args: >>> https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4553. It lacks >>> polishing and my time and interest, but I firmly believe this should help >>> the stream fusion use case, because SAT+inlining is *exactly* >>> specialization of recursive functions for static arguments. >>> (Side Note: Andrasz pointed out to me two years ago that there were further >>> complications which I don't recall and which mean that he settled on the >>> practical compromise of going for foldr/build fusion and enabling zip >>> fusion/parallel for with stream fusion.) >>> >>> --- >>> >>> > what happens if you have sharing >>> > let ys :: Stream Int = h x >>> > in f ys + g ys >>> >>> This is a flaw in the way that GHC says "lists are streams". IMO a stream >>> (such as in streamly) should happily duplicate all the work that goes into >>> computing the elements of `ys`, since it is just code. By contrast, data >>> structures such as lists do not have that luxury. `let ys :: [Int] = ...; >>> in f ys ++ g ys` should not try to fuse `ys`. >>> Here's a repo that implements just foldr/build fusion without rewriting >>> lists to/from it: https://github.com/sgraf812/foldr-build. It doesn't have >>> the "fusion duplicates work" flaw because there is no data that should have >>> shared the work in the first place. >>> >>> --- >>> >>> TLDR; inlining and specialising `Step` sounds like exactly what is needed. >>> Specialisation of concatMap is the hard part for stream fusion, though, and >>> that means we'll end up with `Step` in the program nonetheless. >>> >>> Btw., Lean's standard `Iterator` framework (inspired by Rust's) will >>> implement "stream fusion". (That's what external iteration is.) I'm not >>> saying that it's simple or that the optimiser is equipped to handle it, >>> though :) >>> >>> Cheers, >>> Sebastian >>> >>> >>> Am Di., 9. Dez. 2025 um 13:45 Uhr schrieb Simon Peyton Jones >>> <[email protected]>: >>>>> >>>>> I think a solution based on multi-stage programming is the most >>>>> natural way to tackle the stream fusion problem in a robust manner. >>>> >>>> >>>> How close are we to being able to do that in Haskell? >>>> >>>> Simon >>>> >>>> On Tue, 9 Dec 2025 at 09:56, Matthew Pickering >>>> <[email protected]> wrote: >>>>> >>>>> I think a solution based on multi-stage programming is the most >>>>> natural way to tackle the stream fusion problem in a robust manner. >>>>> >>>>> For example: https://andraskovacs.github.io/pdfs/2ltt_icfp24.pdf >>>>> >>>>> By allowing a programmer to express which parts of the program should >>>>> only exist code generation time, you guarantee that fusion occurs. >>>>> This seems much more aligned with how Haskell programmers think about >>>>> other things. >>>>> >>>>> Cheers, >>>>> >>>>> Matt >>>>> >>>>> On Tue, Dec 9, 2025 at 12:03 AM Harendra Kumar <[email protected]> >>>>> wrote: >>>>> > >>>>> > On Mon, 8 Dec 2025 at 23:35, Jaro Reinders <[email protected]> >>>>> > wrote: >>>>> > > >>>>> > > I suggested we could integrate this into GHC by giving an inlining >>>>> > > discount to >>>>> > > any function which contains function that is also mentioned in a >>>>> > > rewrite rule. >>>>> > >>>>> > There are two distinct fusion mechanisms under consideration here - >>>>> > foldr/build fusion and stream fusion. Rewrite rules allow us to >>>>> > replace a combination of functions with an equivalent function, >>>>> > forming the basis of foldr/build fusion by fusing or eliminating >>>>> > intermediate functions. Stream fusion on the other hand eliminates >>>>> > intermediate constructors, the key optimizations for achieving this >>>>> > are case-of-case and worker-wrapper. Both types of fusion require >>>>> > inlining but they are clearly different at the implementation level, >>>>> > even though the same high level ideas may apply in both cases. >>>>> > >>>>> > The fusion-plugin deals with stream fusion only -- where inlining and >>>>> > spec-constr are used to expose constructors which are then eliminated >>>>> > by case-of-case (for linear pipelines) or worker wrapper (for >>>>> > recursive loops) transformations. fusion-plugin has been used >>>>> > extensively in streamly and we have been able to fuse almost all cases >>>>> > that we wanted to. I would recommend that you try out fusion-plugin on >>>>> > the vector package, it can help fuse the pesky cases and you can also >>>>> > gain some insight into how much code bloat it actually causes in >>>>> > practice. Vector is primarily based on stream fusion, and even though >>>>> > it uses rewrite rules, those are not for fusion as such I believe. >>>>> > >>>>> > > However, I also noted I think this will have a large impact on code >>>>> > > size and >>>>> > > compilation time in larger code bases. Simon agreed, but noted that >>>>> > > Harendra's >>>>> > > plugin is much more selective than what I suggested. >>>>> > >>>>> > Typically we have a pipeline of functions connected by fusible >>>>> > constructors. To fuse the entire pipeline completely we need to inline >>>>> > all the functions involved in the pipeline. Thus instead of having >>>>> > multiple small functions we end up creating one large combined >>>>> > function. However, when we use case-of-case transformation on the >>>>> > combined function, the size often gets reduced dramatically. In fact >>>>> > the total size of the end product is usually smaller than the combined >>>>> > original components in the pipeline. However, ghc requires more memory >>>>> > and CPU to simplify, especially if the loop becomes too large. >>>>> > >>>>> > In our experience aggressive inlining for fusing linear segments of >>>>> > pipelines even if they are pretty big has almost never been a problem. >>>>> > If it becomes large we can always break the pipeline by using a >>>>> > NOINLINE, making a trade off between runtime perf and compile time >>>>> > resource requirement, but we rarely needed to do that. The code bloat >>>>> > comes more often from the spec-constr optimization when we are fusing >>>>> > recursive loops. In some cases we have used >>>>> > -fspec-constr-recursive=16, which I admit is a very large value, the >>>>> > compile times can sometimes increase dramatically due to this. We can >>>>> > always choose to use a lower value which may result in unfused code in >>>>> > some cases. There may be scope of improvement in GHC for making this >>>>> > optimization smarter and more selective. >>>>> > >>>>> > We can also put some hard thresholds or heuristics here for the cases >>>>> > when the resulting monolithic functions become too large. One possible >>>>> > way to decide whether to inline or not could be to check if the size >>>>> > of the combined/optimized function is smaller than or roughly >>>>> > equivalent to the sum of the individual parts. If inlining does not >>>>> > result in elimination of the constructors in subsequent passes, then >>>>> > the inlining is not useful. But we have seen that it is almost always >>>>> > possible to eliminate all or most of the constructors. >>>>> > >>>>> > > Translating this to the fold/build fusion system in GHC today, we >>>>> > > would force >>>>> > > inlining of any function that calls `foldr` or `build` (or `augment`). >>>>> > > Essentially, we would want to inline any binding that refers to a >>>>> > > function that >>>>> > > partakes in the rewrite rules for fusion. I'll admit we probably >>>>> > > don't have to >>>>> > > generalise this to all rewrite rules, but even with this restriction >>>>> > > it seems >>>>> > > like a rather aggressive optimisation strategy to me. >>>>> > >>>>> > I do not have much experience with using foldr/build. But conceptually >>>>> > this is quite similar to what we do in stream fusion and similar ideas >>>>> > should be applicable here as well. In this case we inline to expose >>>>> > foldr/build primitives and then use rewrite rules to combine those. If >>>>> > the inlining does not result in further elimination or firing of >>>>> > rewrite rules then the inlining is not useful. You can probably try >>>>> > out automatic inlining in the same way as we do in fusion-plugin and >>>>> > see if code-bloat/compilation times become a real issue, and in which >>>>> > particular cases, then put heuristics/thresholds to avoid that. Is >>>>> > there a driving use case for extending this to foldr/build? >>>>> > >>>>> > -harendra >>>>> > _______________________________________________ >>>>> > ghc-devs mailing list -- [email protected] >>>>> > To unsubscribe send an email to [email protected] >>>>> _______________________________________________ >>>>> ghc-devs mailing list -- [email protected] >>>>> To unsubscribe send an email to [email protected] >>>> >>>> _______________________________________________ >>>> ghc-devs mailing list -- [email protected] >>>> To unsubscribe send an email to [email protected] >> >> _______________________________________________ >> ghc-devs mailing list -- [email protected] >> To unsubscribe send an email to [email protected] > > _______________________________________________ > ghc-devs mailing list -- [email protected] > To unsubscribe send an email to [email protected] _______________________________________________ ghc-devs mailing list -- [email protected] To unsubscribe send an email to [email protected]
