Re: New egg: CHICKEN Transducers

Jeremy Steward Fri, 13 Jan 2023 16:57:49 -0800

On 1/11/23 16:20, Tomas Hlavaty wrote:

Hi Jeremy,


thank you for interesting reading.


Thank you for taking the time to go through it :)

On Wed 04 Jan 2023 at 18:48, Jeremy Steward <jer...@thatgeoguy.ca> wrote:

<https://www.thatgeoguy.ca/blog/2023/01/04/reflections-on-transducers/>


    My main problem with generators and accumulators is that they
    basically replace all our existing data types with a new type
    (generators) that can then be mapped / filtered over in a unified
    way. After one is done with gmap / gfilter / etc. they can then
    convert this generator back into some kind of type using an
    accumulator or one of the built in generator->type procedures. This
    solves a problem of abstraction by routing around it. Rather than
    worry about what operations are defined on a type, we instead just
    create a type that has all operations work on it.

What are the procedures reduced? and unwrap in your example?
Don't they point to similar issue with transducers?

This is how early termination is done with transducers. We could usecall/cc, but that would involve injecting call/cc somehow through allthe transducers / collectors / etc. Instead, if a transducer (e.g.|take|) wants to stop folding over the collection early, it can justwrap the current result in |make-reduced| which will tell the folder"hey, I'm fully reduced" and the folder will stop.

It doesn't really point to the same problem. See, transducers are stillmonomorphized to the specific type. There is no polymorphism intransducers as the library currently exists. So there's no typedeflection going on here.


    This kind of approach is a good first abstraction, but fails because
    it is generally impossible to make this fast. It also doesn’t solve
    the fact that we have type->list and list->type proliferation. If
    anything, it makes it worse because most libraries are not
    generator-aware, and writing generators correctly can be
    tricky. That’s not to say writing any code cannot be tricky, but the
    obvious ways to write a generator are often using
    make-coroutine-generator which uses call/cc4 and as a result is
    pretty slow on most Schemes.

It seem that the problem is with the way people write generators in scheme.
Not that generators are "generally impossible to make fast".

Generators are "generally impossible to make fast." If you accept agenerator as an argument to a procedure, then you don't know (fromwithin that procedure) if that generator produced values directly or isthe result of several layers of (gmap ... (gmap ... (gmap ...generator))). Note this isn't true of lists and vectors and othercollection types: getting a value from a vector is O(1), and getting allvalues sequentially is O(n).

Because of that, you can very quickly end up with generators that callgenerators that call generators that call generators and so on. Thisisn't inherently slow, but those functions aren't local and it's a lotmore expensive than a simple fold / named-let over the data.

Now, for a lot of code that may not be a problem. You might keep thingssimple or only use generators that directly generate values and aren'tchained through several intermediate processes / mappers like above.Alternatively, you might find that a generator is doing somethingexpensive anyways, like a network call. In such cases, maybe thedifference between transducers vs. higher-order generator functionsdon't matter.

For me though, I don't really want to work off that assumption.Transducers are plenty fast and there's not really the same issue ofnon-locality of the data / functions that one has with generators. Thegeneral advice of "benchmark your code and target performanceimprovements" applies, but if I write a function that accepts agenerator, I'd like to be able to guess at what I'm dealing with.

Fun experiment: for any given set of procedures over a generator I'dwager that transducers will probably be faster. You can test this yourself:


    (transduce reader-fold
               (compose
                 ;; All variants of map / filter / etc
                 )
               collect-list
               generator)

I am willing to bet this will be faster than almost any combination ofgmap / gfilter / etc. I say almost because I know if your generator isdoing a lot of IO or some kind of expensive network call then thedifferences will be completely dwarfed.

So in the most general case, I'd say yeah - generators are pretty muchimpossible to make fast. Compared to iterating over a list or vector, Ican't know upfront that any given process is going to be bounded withinsome time / complexity. Because of that, they seem rather unappealing tome. I'll fully admit in the past I didn't always hold this view, but incontrast to transducers (srfi-171 or otherwise) it's incredibly obviousthat srfi-158 generators are not going to be aiming at the sameperformance targets.


Cheers,
--
Jeremy Steward

Re: New egg: CHICKEN Transducers

Reply via email to