BTW Harendra, what happens if you have sharing
Typically using a stream implies you *really* want constant space use over sharing.
So the stream should just be inlined twice at the cost of code size.

A colleague from WT wrote a whole blog post on this topic a few years back: https://www.well-typed.com/blog/2016/09/sharing-conduit/

On 09/12/2025 10:48, Simon Peyton Jones wrote:

    There are two distinct fusion mechanisms under consideration here -
    foldr/build fusion and stream fusion. Rewrite rules allow us to
    replace a combination of functions with an equivalent function,
    forming the basis of foldr/build fusion by fusing or eliminating
    intermediate functions. Stream fusion on the other hand eliminates
    intermediate constructors


I'm not really getting this distinction.   foldr/build fusion eliminates intermediate lists (cons cells); stream fusion eliminates intermediate Step constructors.   What is the difference, really?

The only difference in my mind is that you can declare the entire Step data type to be fusible, thereby driving aggressive inlining.  Step is *only* an intermediate data type.    But we can't do that for lists because they are ubiquitous.

BTW Harendra, what happens if you have sharing

 let ys :: Stream Int = h x
 in f ys + g ys

where one Stream is consumed by two consumers, f and g.   Now it's not so easy to eliminate the intermediate, and aggressively inlining f, g, and h will simply increase code size.

Simon

On Tue, 9 Dec 2025 at 00:03, Harendra Kumar <[email protected]> wrote:

    On Mon, 8 Dec 2025 at 23:35, Jaro Reinders
    <[email protected]> wrote:
    >
    > I suggested we could integrate this into GHC by giving an
    inlining discount to
    > any function which contains function that is also mentioned in a
    rewrite rule.

    There are two distinct fusion mechanisms under consideration here -
    foldr/build fusion and stream fusion. Rewrite rules allow us to
    replace a combination of functions with an equivalent function,
    forming the basis of foldr/build fusion by fusing or eliminating
    intermediate functions. Stream fusion on the other hand eliminates
    intermediate constructors, the key optimizations for achieving this
    are case-of-case and worker-wrapper. Both types of fusion require
    inlining but they are clearly different at the implementation level,
    even though the same high level ideas may apply in both cases.

    The fusion-plugin deals with stream fusion only -- where inlining and
    spec-constr are used to expose constructors which are then eliminated
    by case-of-case (for linear pipelines) or worker wrapper (for
    recursive loops) transformations. fusion-plugin has been used
    extensively in streamly and we have been able to fuse almost all cases
    that we wanted to. I would recommend that you try out fusion-plugin on
    the vector package, it can help fuse the pesky cases and you can also
    gain some insight into how much code bloat it actually causes in
    practice. Vector is primarily based on stream fusion, and even though
    it uses rewrite rules, those are not for fusion as such I believe.

    > However, I also noted I think this will have a large impact on
    code size and
    > compilation time in larger code bases. Simon agreed, but noted
    that Harendra's
    > plugin is much more selective than what I suggested.

    Typically we have a pipeline of functions connected by fusible
    constructors. To fuse the entire pipeline completely we need to inline
    all the functions involved in the pipeline. Thus instead of having
    multiple small functions we end up creating one large combined
    function. However, when we use case-of-case transformation on the
    combined function, the size often gets reduced dramatically. In fact
    the total size of the end product is usually smaller than the combined
    original components in the pipeline. However, ghc requires more memory
    and CPU to simplify, especially if the loop becomes too large.

    In our experience aggressive inlining for fusing linear segments of
    pipelines even if they are pretty big has almost never been a problem.
    If it becomes large we can always break the pipeline by using a
    NOINLINE, making a trade off between runtime perf and compile time
    resource requirement, but we rarely needed to do that. The code bloat
    comes more often from the spec-constr optimization when we are fusing
    recursive loops. In some cases we have used
    -fspec-constr-recursive=16, which I admit is a very large value, the
    compile times can sometimes increase dramatically due to this. We can
    always choose to use a lower value which may result in unfused code in
    some cases. There may be scope of improvement in GHC for making this
    optimization smarter and more selective.

    We can also put some hard thresholds or heuristics here for the cases
    when the resulting monolithic functions become too large. One possible
    way to decide whether to inline or not could be to check if the size
    of the combined/optimized function is smaller than or roughly
    equivalent to the sum of the individual parts. If inlining does not
    result in elimination of the constructors in subsequent passes, then
    the inlining is not useful. But we have seen that it is almost always
    possible to eliminate all or most of the constructors.

    > Translating this to the fold/build fusion system in GHC today,
    we would force
    > inlining of any function that calls `foldr` or `build` (or
    `augment`).
    > Essentially, we would want to inline any binding that refers to
    a function that
    > partakes in the rewrite rules for fusion. I'll admit we probably
    don't have to
    > generalise this to all rewrite rules, but even with this
    restriction it seems
    > like a rather aggressive optimisation strategy to me.

    I do not have much experience with using foldr/build. But conceptually
    this is quite similar to what we do in stream fusion and similar ideas
    should be applicable here as well. In this case we inline to expose
    foldr/build primitives and then use rewrite rules to combine those. If
    the inlining does not result in further elimination or firing of
    rewrite rules then the inlining is not useful. You can probably try
    out automatic inlining in the same way as we do in fusion-plugin and
    see if code-bloat/compilation times become a real issue, and in which
    particular cases, then put heuristics/thresholds to avoid that. Is
    there a driving use case for extending this to foldr/build?

    -harendra


_______________________________________________
ghc-devs mailing list [email protected]
To unsubscribe send an email [email protected]
_______________________________________________
ghc-devs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to