On 1/11/23 16:20, Tomas Hlavaty wrote:
Hi Jeremy,

thank you for interesting reading.

Thank you for taking the time to go through it :)

On Wed 04 Jan 2023 at 18:48, Jeremy Steward <jer...@thatgeoguy.ca> wrote:

    My main problem with generators and accumulators is that they
    basically replace all our existing data types with a new type
    (generators) that can then be mapped / filtered over in a unified
    way. After one is done with gmap / gfilter / etc. they can then
    convert this generator back into some kind of type using an
    accumulator or one of the built in generator->type procedures. This
    solves a problem of abstraction by routing around it. Rather than
    worry about what operations are defined on a type, we instead just
    create a type that has all operations work on it.

What are the procedures reduced? and unwrap in your example?
Don't they point to similar issue with transducers?

|reduced?| is a predicate that checks if a value has been wrapped by |make-reduced|. |unwrap| unwraps a value that has been wrapped by |make-reduced|. A "reduced" value isn't particularly special, but it's using a unique record type (and predicate) to identify a particular kind of value.

This is how early termination is done with transducers. We could use call/cc, but that would involve injecting call/cc somehow through all the transducers / collectors / etc. Instead, if a transducer (e.g. |take|) wants to stop folding over the collection early, it can just wrap the current result in |make-reduced| which will tell the folder "hey, I'm fully reduced" and the folder will stop.

It doesn't really point to the same problem. See, transducers are still monomorphized to the specific type. There is no polymorphism in transducers as the library currently exists. So there's no type deflection going on here.

    This kind of approach is a good first abstraction, but fails because
    it is generally impossible to make this fast. It also doesn’t solve
    the fact that we have type->list and list->type proliferation. If
    anything, it makes it worse because most libraries are not
    generator-aware, and writing generators correctly can be
    tricky. That’s not to say writing any code cannot be tricky, but the
    obvious ways to write a generator are often using
    make-coroutine-generator which uses call/cc4 and as a result is
    pretty slow on most Schemes.

It seem that the problem is with the way people write generators in scheme.
Not that generators are "generally impossible to make fast".

Generators are "generally impossible to make fast." If you accept a generator as an argument to a procedure, then you don't know (from within that procedure) if that generator produced values directly or is the result of several layers of (gmap ... (gmap ... (gmap ... generator))). Note this isn't true of lists and vectors and other collection types: getting a value from a vector is O(1), and getting all values sequentially is O(n).

Because of that, you can very quickly end up with generators that call generators that call generators that call generators and so on. This isn't inherently slow, but those functions aren't local and it's a lot more expensive than a simple fold / named-let over the data.

Now, for a lot of code that may not be a problem. You might keep things simple or only use generators that directly generate values and aren't chained through several intermediate processes / mappers like above. Alternatively, you might find that a generator is doing something expensive anyways, like a network call. In such cases, maybe the difference between transducers vs. higher-order generator functions don't matter.

For me though, I don't really want to work off that assumption. Transducers are plenty fast and there's not really the same issue of non-locality of the data / functions that one has with generators. The general advice of "benchmark your code and target performance improvements" applies, but if I write a function that accepts a generator, I'd like to be able to guess at what I'm dealing with.

Fun experiment: for any given set of procedures over a generator I'd wager that transducers will probably be faster. You can test this yourself:

    (transduce reader-fold
                 ;; All variants of map / filter / etc

I am willing to bet this will be faster than almost any combination of gmap / gfilter / etc. I say almost because I know if your generator is doing a lot of IO or some kind of expensive network call then the differences will be completely dwarfed.

So in the most general case, I'd say yeah - generators are pretty much impossible to make fast. Compared to iterating over a list or vector, I can't know upfront that any given process is going to be bounded within some time / complexity. Because of that, they seem rather unappealing to me. I'll fully admit in the past I didn't always hold this view, but in contrast to transducers (srfi-171 or otherwise) it's incredibly obvious that srfi-158 generators are not going to be aiming at the same performance targets.

Jeremy Steward

Reply via email to