Re: #{:eduction :performance} Trying to understand when to use eduction

Leon Grapenthin Sun, 19 Jul 2015 10:56:21 -0700

Stuart and Alex, thank you for your replies and recommondations.

I take it then that the problem is the seq casting performed in apply and 
in reduce1.


For now the only way to avoid applys seq casting seems to be a hackish 
.doInvoke call.

Kind regards,
 Leon.


On Sunday, July 19, 2015 at 6:34:37 PM UTC+2, Alex Miller wrote:
>
>
>
> On Sunday, July 19, 2015 at 10:53:25 AM UTC-5, Stuart Sierra wrote:
>>
>> Hi Leon,
>>
>> I think this is an edge case related to how varargs functions are 
>> implemented in Clojure.
>>
>> The varargs arity of `max` is implemented with `reduce1`: core.clj line 
>> 1088 
>> <https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L1088>
>>
>> `reduce1` is a simplified implementation of "reduce" defined early in 
>> clojure.core before the optimized reduction protocols have been loaded: 
>> core.clj 
>> line 894 
>> <https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L894>.
>>  
>> `reduce1` is implemented in terms of lazy sequences, with support for 
>> chunking.
>>
>> So `apply max` defaults to using chunked lazy sequence operations. `map` 
>> and `range` both return chunked sequences.
>>
>> `eduction` returns an Iterable, so when you `apply max` on it, it turns 
>> the Iterable into a Seq, but it's not a chunked seq. Therefore, it's 
>> slightly slower than `apply max` on a chunked seq.
>>
>
> seqs on eductions *are* chunked - they will fall into this case during 
> seq: 
> https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L524-L525
>  
> which produces a chunked sequence over an Iterable.
>  
>
>> In this case, to ensure you're using the fast-path internal reduce over `
>> eduction`, you can use `reduce` directly:
>> (reduce max 0 (eduction (map inc) (range 100000)))
>> You must provide an init value because `eduction` does not assume the 
>> "init with first element" behavior of sequences.
>>
>> This version, in my informal benchmarking, is the fastest.
>>
>> Lots of functions in clojure.core use `reduce1` in their varargs 
>> implementation. Perhaps they could be changed to use the optimized `
>> reduce`, but this might add a lot of repeated definitions as 
>> clojure.core is bootstrapping itself. I'm not sure.
>>
>
> For various bootstrapping reasons, this is a hard change.
>  
>
>> In general, I would not assume that `eduction` is automatically faster 
>> than lazy sequences. It will be faster only in the cases where it can use 
>> the optimized reduction protocols such as InternalReduce. If the optimized 
>> path isn't available, many operations will fall back to lazy sequences for 
>> backwards-compatibility. 
>>
>> I would suggest using `eduction` only when you *know* you're going to 
>> consume the result with `reduce` or `transduce`. As always, test first, and 
>> profilers are your friend. :)
>>
>
> Use eduction for delayed eager *non-cached* execution. Seqs give you 
> delayed *cached* execution. 
> If you're doing a transformation once, or if the thing you're doing would 
> consume too many resources if cached, then use eduction.
> If you need to do a transformation once and then use the result multiple 
> times, it's better to use sequence+transducer to get the caching effect and 
> the benefits of reduced allocation during transformation.
>
> Chunked seqs are surprisingly fast, particularly when all of the 
> operations in a nested transformation are chunked. However, every new layer 
> adds another set of (chunked) sequence allocation. Eduction or anything 
> transducer-based is going to do no seq allocation and execute as a single 
> eager pass. Generally this means that transducer stuff will win more if the 
> collection source is reducible, if the inputs are "large" (more input = 
> more win), or if the number of transformations is >1 (more transformations 
> = more wins).
>
>  
>
>>
>> –S
>>
>>
>>
>> On Saturday, July 18, 2015 at 9:11:45 AM UTC-4, Leon Grapenthin wrote:
>>>
>>> My understanding was that if I pass an eduction to a process using 
>>> reduce, I can save the computer time and space because the per step 
>>> overhead of lazy sequences is gone and also the entire sequence does not 
>>> have to reside in memory at once.
>>>
>>> When I time the difference between (apply max (map inc (range 100000))) 
>>> and (apply max (eduction (map inc) (range 100000))), the lazy-seq variant 
>>> wins.
>>>
>>> I'd like to understand why, and when eductions should be used instead.
>>>
>>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: #{:eduction :performance} Trying to understand when to use eduction

Reply via email to