Stuart and Alex, thank you for your replies and recommondations. I take it then that the problem is the seq casting performed in apply and in reduce1.
For now the only way to avoid applys seq casting seems to be a hackish .doInvoke call. Kind regards, Leon. On Sunday, July 19, 2015 at 6:34:37 PM UTC+2, Alex Miller wrote: > > > > On Sunday, July 19, 2015 at 10:53:25 AM UTC-5, Stuart Sierra wrote: >> >> Hi Leon, >> >> I think this is an edge case related to how varargs functions are >> implemented in Clojure. >> >> The varargs arity of `max` is implemented with `reduce1`: core.clj line >> 1088 >> <https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L1088> >> >> `reduce1` is a simplified implementation of "reduce" defined early in >> clojure.core before the optimized reduction protocols have been loaded: >> core.clj >> line 894 >> <https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L894>. >> >> `reduce1` is implemented in terms of lazy sequences, with support for >> chunking. >> >> So `apply max` defaults to using chunked lazy sequence operations. `map` >> and `range` both return chunked sequences. >> >> `eduction` returns an Iterable, so when you `apply max` on it, it turns >> the Iterable into a Seq, but it's not a chunked seq. Therefore, it's >> slightly slower than `apply max` on a chunked seq. >> > > seqs on eductions *are* chunked - they will fall into this case during > seq: > https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L524-L525 > > which produces a chunked sequence over an Iterable. > > >> In this case, to ensure you're using the fast-path internal reduce over ` >> eduction`, you can use `reduce` directly: >> (reduce max 0 (eduction (map inc) (range 100000))) >> You must provide an init value because `eduction` does not assume the >> "init with first element" behavior of sequences. >> >> This version, in my informal benchmarking, is the fastest. >> >> Lots of functions in clojure.core use `reduce1` in their varargs >> implementation. Perhaps they could be changed to use the optimized ` >> reduce`, but this might add a lot of repeated definitions as >> clojure.core is bootstrapping itself. I'm not sure. >> > > For various bootstrapping reasons, this is a hard change. > > >> In general, I would not assume that `eduction` is automatically faster >> than lazy sequences. It will be faster only in the cases where it can use >> the optimized reduction protocols such as InternalReduce. If the optimized >> path isn't available, many operations will fall back to lazy sequences for >> backwards-compatibility. >> >> I would suggest using `eduction` only when you *know* you're going to >> consume the result with `reduce` or `transduce`. As always, test first, and >> profilers are your friend. :) >> > > Use eduction for delayed eager *non-cached* execution. Seqs give you > delayed *cached* execution. > If you're doing a transformation once, or if the thing you're doing would > consume too many resources if cached, then use eduction. > If you need to do a transformation once and then use the result multiple > times, it's better to use sequence+transducer to get the caching effect and > the benefits of reduced allocation during transformation. > > Chunked seqs are surprisingly fast, particularly when all of the > operations in a nested transformation are chunked. However, every new layer > adds another set of (chunked) sequence allocation. Eduction or anything > transducer-based is going to do no seq allocation and execute as a single > eager pass. Generally this means that transducer stuff will win more if the > collection source is reducible, if the inputs are "large" (more input = > more win), or if the number of transformations is >1 (more transformations > = more wins). > > > >> >> –S >> >> >> >> On Saturday, July 18, 2015 at 9:11:45 AM UTC-4, Leon Grapenthin wrote: >>> >>> My understanding was that if I pass an eduction to a process using >>> reduce, I can save the computer time and space because the per step >>> overhead of lazy sequences is gone and also the entire sequence does not >>> have to reside in memory at once. >>> >>> When I time the difference between (apply max (map inc (range 100000))) >>> and (apply max (eduction (map inc) (range 100000))), the lazy-seq variant >>> wins. >>> >>> I'd like to understand why, and when eductions should be used instead. >>> >> -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.