Re: #{:eduction :performance} Trying to understand when to use eduction
Thanks for the correction, Alex. On Sunday, July 19, 2015 at 12:34:37 PM UTC-4, Alex Miller wrote: seqs on eductions *are* chunked - they will fall into this case during seq: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L524-L525 which produces a chunked sequence over an Iterable. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: #{:eduction :performance} Trying to understand when to use eduction
Stuart and Alex, thank you for your replies and recommondations. I take it then that the problem is the seq casting performed in apply and in reduce1. For now the only way to avoid applys seq casting seems to be a hackish .doInvoke call. Kind regards, Leon. On Sunday, July 19, 2015 at 6:34:37 PM UTC+2, Alex Miller wrote: On Sunday, July 19, 2015 at 10:53:25 AM UTC-5, Stuart Sierra wrote: Hi Leon, I think this is an edge case related to how varargs functions are implemented in Clojure. The varargs arity of `max` is implemented with `reduce1`: core.clj line 1088 https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L1088 `reduce1` is a simplified implementation of reduce defined early in clojure.core before the optimized reduction protocols have been loaded: core.clj line 894 https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L894. `reduce1` is implemented in terms of lazy sequences, with support for chunking. So `apply max` defaults to using chunked lazy sequence operations. `map` and `range` both return chunked sequences. `eduction` returns an Iterable, so when you `apply max` on it, it turns the Iterable into a Seq, but it's not a chunked seq. Therefore, it's slightly slower than `apply max` on a chunked seq. seqs on eductions *are* chunked - they will fall into this case during seq: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L524-L525 which produces a chunked sequence over an Iterable. In this case, to ensure you're using the fast-path internal reduce over ` eduction`, you can use `reduce` directly: (reduce max 0 (eduction (map inc) (range 10))) You must provide an init value because `eduction` does not assume the init with first element behavior of sequences. This version, in my informal benchmarking, is the fastest. Lots of functions in clojure.core use `reduce1` in their varargs implementation. Perhaps they could be changed to use the optimized ` reduce`, but this might add a lot of repeated definitions as clojure.core is bootstrapping itself. I'm not sure. For various bootstrapping reasons, this is a hard change. In general, I would not assume that `eduction` is automatically faster than lazy sequences. It will be faster only in the cases where it can use the optimized reduction protocols such as InternalReduce. If the optimized path isn't available, many operations will fall back to lazy sequences for backwards-compatibility. I would suggest using `eduction` only when you *know* you're going to consume the result with `reduce` or `transduce`. As always, test first, and profilers are your friend. :) Use eduction for delayed eager *non-cached* execution. Seqs give you delayed *cached* execution. If you're doing a transformation once, or if the thing you're doing would consume too many resources if cached, then use eduction. If you need to do a transformation once and then use the result multiple times, it's better to use sequence+transducer to get the caching effect and the benefits of reduced allocation during transformation. Chunked seqs are surprisingly fast, particularly when all of the operations in a nested transformation are chunked. However, every new layer adds another set of (chunked) sequence allocation. Eduction or anything transducer-based is going to do no seq allocation and execute as a single eager pass. Generally this means that transducer stuff will win more if the collection source is reducible, if the inputs are large (more input = more win), or if the number of transformations is 1 (more transformations = more wins). –S On Saturday, July 18, 2015 at 9:11:45 AM UTC-4, Leon Grapenthin wrote: My understanding was that if I pass an eduction to a process using reduce, I can save the computer time and space because the per step overhead of lazy sequences is gone and also the entire sequence does not have to reside in memory at once. When I time the difference between (apply max (map inc (range 10))) and (apply max (eduction (map inc) (range 10))), the lazy-seq variant wins. I'd like to understand why, and when eductions should be used instead. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more
Re: #{:eduction :performance} Trying to understand when to use eduction
Hi Leon, I think this is an edge case related to how varargs functions are implemented in Clojure. The varargs arity of `max` is implemented with `reduce1`: core.clj line 1088 https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L1088 `reduce1` is a simplified implementation of reduce defined early in clojure.core before the optimized reduction protocols have been loaded: core.clj line 894 https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L894. `reduce1` is implemented in terms of lazy sequences, with support for chunking. So `apply max` defaults to using chunked lazy sequence operations. `map` and `range` both return chunked sequences. `eduction` returns an Iterable, so when you `apply max` on it, it turns the Iterable into a Seq, but it's not a chunked seq. Therefore, it's slightly slower than `apply max` on a chunked seq. In this case, to ensure you're using the fast-path internal reduce over ` eduction`, you can use `reduce` directly: (reduce max 0 (eduction (map inc) (range 10))) You must provide an init value because `eduction` does not assume the init with first element behavior of sequences. This version, in my informal benchmarking, is the fastest. Lots of functions in clojure.core use `reduce1` in their varargs implementation. Perhaps they could be changed to use the optimized `reduce`, but this might add a lot of repeated definitions as clojure.core is bootstrapping itself. I'm not sure. In general, I would not assume that `eduction` is automatically faster than lazy sequences. It will be faster only in the cases where it can use the optimized reduction protocols such as InternalReduce. If the optimized path isn't available, many operations will fall back to lazy sequences for backwards-compatibility. I would suggest using `eduction` only when you *know* you're going to consume the result with `reduce` or `transduce`. As always, test first, and profilers are your friend. :) –S On Saturday, July 18, 2015 at 9:11:45 AM UTC-4, Leon Grapenthin wrote: My understanding was that if I pass an eduction to a process using reduce, I can save the computer time and space because the per step overhead of lazy sequences is gone and also the entire sequence does not have to reside in memory at once. When I time the difference between (apply max (map inc (range 10))) and (apply max (eduction (map inc) (range 10))), the lazy-seq variant wins. I'd like to understand why, and when eductions should be used instead. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: #{:eduction :performance} Trying to understand when to use eduction
On Sunday, July 19, 2015 at 10:53:25 AM UTC-5, Stuart Sierra wrote: Hi Leon, I think this is an edge case related to how varargs functions are implemented in Clojure. The varargs arity of `max` is implemented with `reduce1`: core.clj line 1088 https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L1088 `reduce1` is a simplified implementation of reduce defined early in clojure.core before the optimized reduction protocols have been loaded: core.clj line 894 https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L894. `reduce1` is implemented in terms of lazy sequences, with support for chunking. So `apply max` defaults to using chunked lazy sequence operations. `map` and `range` both return chunked sequences. `eduction` returns an Iterable, so when you `apply max` on it, it turns the Iterable into a Seq, but it's not a chunked seq. Therefore, it's slightly slower than `apply max` on a chunked seq. seqs on eductions *are* chunked - they will fall into this case during seq: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L524-L525 which produces a chunked sequence over an Iterable. In this case, to ensure you're using the fast-path internal reduce over ` eduction`, you can use `reduce` directly: (reduce max 0 (eduction (map inc) (range 10))) You must provide an init value because `eduction` does not assume the init with first element behavior of sequences. This version, in my informal benchmarking, is the fastest. Lots of functions in clojure.core use `reduce1` in their varargs implementation. Perhaps they could be changed to use the optimized `reduce`, but this might add a lot of repeated definitions as clojure.core is bootstrapping itself. I'm not sure. For various bootstrapping reasons, this is a hard change. In general, I would not assume that `eduction` is automatically faster than lazy sequences. It will be faster only in the cases where it can use the optimized reduction protocols such as InternalReduce. If the optimized path isn't available, many operations will fall back to lazy sequences for backwards-compatibility. I would suggest using `eduction` only when you *know* you're going to consume the result with `reduce` or `transduce`. As always, test first, and profilers are your friend. :) Use eduction for delayed eager *non-cached* execution. Seqs give you delayed *cached* execution. If you're doing a transformation once, or if the thing you're doing would consume too many resources if cached, then use eduction. If you need to do a transformation once and then use the result multiple times, it's better to use sequence+transducer to get the caching effect and the benefits of reduced allocation during transformation. Chunked seqs are surprisingly fast, particularly when all of the operations in a nested transformation are chunked. However, every new layer adds another set of (chunked) sequence allocation. Eduction or anything transducer-based is going to do no seq allocation and execute as a single eager pass. Generally this means that transducer stuff will win more if the collection source is reducible, if the inputs are large (more input = more win), or if the number of transformations is 1 (more transformations = more wins). –S On Saturday, July 18, 2015 at 9:11:45 AM UTC-4, Leon Grapenthin wrote: My understanding was that if I pass an eduction to a process using reduce, I can save the computer time and space because the per step overhead of lazy sequences is gone and also the entire sequence does not have to reside in memory at once. When I time the difference between (apply max (map inc (range 10))) and (apply max (eduction (map inc) (range 10))), the lazy-seq variant wins. I'd like to understand why, and when eductions should be used instead. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
#{:eduction :performance} Trying to understand when to use eduction
My understanding was that if I pass an eduction to a process using reduce, I can save the computer time and space because the per step overhead of lazy sequences is gone and also the entire sequence does not have to reside in memory at once. When I time the difference between (apply max (map inc (range 10))) and (apply max (eduction (map inc) (range 10))), the lazy-seq variant wins. I'd like to understand why, and when eductions should be used instead. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.