On 1/11/23 16:20, Tomas Hlavaty wrote:
thank you for interesting reading.
Thank you for taking the time to go through it :)
On Wed 04 Jan 2023 at 18:48, Jeremy Steward <jer...@thatgeoguy.ca> wrote:
My main problem with generators and accumulators is that they
basically replace all our existing data types with a new type
(generators) that can then be mapped / filtered over in a unified
way. After one is done with gmap / gfilter / etc. they can then
convert this generator back into some kind of type using an
accumulator or one of the built in generator->type procedures. This
solves a problem of abstraction by routing around it. Rather than
worry about what operations are defined on a type, we instead just
create a type that has all operations work on it.
What are the procedures reduced? and unwrap in your example?
Don't they point to similar issue with transducers?
|reduced?| is a predicate that checks if a value has been wrapped by
|make-reduced|. |unwrap| unwraps a value that has been wrapped by
|make-reduced|. A "reduced" value isn't particularly special, but it's
using a unique record type (and predicate) to identify a particular kind
This is how early termination is done with transducers. We could use
call/cc, but that would involve injecting call/cc somehow through all
the transducers / collectors / etc. Instead, if a transducer (e.g.
|take|) wants to stop folding over the collection early, it can just
wrap the current result in |make-reduced| which will tell the folder
"hey, I'm fully reduced" and the folder will stop.
It doesn't really point to the same problem. See, transducers are still
monomorphized to the specific type. There is no polymorphism in
transducers as the library currently exists. So there's no type
deflection going on here.
This kind of approach is a good first abstraction, but fails because
it is generally impossible to make this fast. It also doesn’t solve
the fact that we have type->list and list->type proliferation. If
anything, it makes it worse because most libraries are not
generator-aware, and writing generators correctly can be
tricky. That’s not to say writing any code cannot be tricky, but the
obvious ways to write a generator are often using
make-coroutine-generator which uses call/cc4 and as a result is
pretty slow on most Schemes.
It seem that the problem is with the way people write generators in scheme.
Not that generators are "generally impossible to make fast".
Generators are "generally impossible to make fast." If you accept a
generator as an argument to a procedure, then you don't know (from
within that procedure) if that generator produced values directly or is
the result of several layers of (gmap ... (gmap ... (gmap ...
generator))). Note this isn't true of lists and vectors and other
collection types: getting a value from a vector is O(1), and getting all
values sequentially is O(n).
Because of that, you can very quickly end up with generators that call
generators that call generators that call generators and so on. This
isn't inherently slow, but those functions aren't local and it's a lot
more expensive than a simple fold / named-let over the data.
Now, for a lot of code that may not be a problem. You might keep things
simple or only use generators that directly generate values and aren't
chained through several intermediate processes / mappers like above.
Alternatively, you might find that a generator is doing something
expensive anyways, like a network call. In such cases, maybe the
difference between transducers vs. higher-order generator functions
For me though, I don't really want to work off that assumption.
Transducers are plenty fast and there's not really the same issue of
non-locality of the data / functions that one has with generators. The
general advice of "benchmark your code and target performance
improvements" applies, but if I write a function that accepts a
generator, I'd like to be able to guess at what I'm dealing with.
Fun experiment: for any given set of procedures over a generator I'd
wager that transducers will probably be faster. You can test this yourself:
;; All variants of map / filter / etc
I am willing to bet this will be faster than almost any combination of
gmap / gfilter / etc. I say almost because I know if your generator is
doing a lot of IO or some kind of expensive network call then the
differences will be completely dwarfed.
So in the most general case, I'd say yeah - generators are pretty much
impossible to make fast. Compared to iterating over a list or vector, I
can't know upfront that any given process is going to be bounded within
some time / complexity. Because of that, they seem rather unappealing to
me. I'll fully admit in the past I didn't always hold this view, but in
contrast to transducers (srfi-171 or otherwise) it's incredibly obvious
that srfi-158 generators are not going to be aiming at the same