Re: #{:eduction :performance} Trying to understand when to use eduction

2015-07-20 Thread Stuart Sierra
Thanks for the correction, Alex.

On Sunday, July 19, 2015 at 12:34:37 PM UTC-4, Alex Miller wrote:

 seqs on eductions *are* chunked - they will fall into this case during 
 seq: 
 https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L524-L525
  
 which produces a chunked sequence over an Iterable.


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: #{:eduction :performance} Trying to understand when to use eduction

2015-07-19 Thread Leon Grapenthin
Stuart and Alex, thank you for your replies and recommondations.

I take it then that the problem is the seq casting performed in apply and 
in reduce1. 

For now the only way to avoid applys seq casting seems to be a hackish 
.doInvoke call.

Kind regards,
 Leon.


On Sunday, July 19, 2015 at 6:34:37 PM UTC+2, Alex Miller wrote:



 On Sunday, July 19, 2015 at 10:53:25 AM UTC-5, Stuart Sierra wrote:

 Hi Leon,

 I think this is an edge case related to how varargs functions are 
 implemented in Clojure.

 The varargs arity of `max` is implemented with `reduce1`: core.clj line 
 1088 
 https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L1088

 `reduce1` is a simplified implementation of reduce defined early in 
 clojure.core before the optimized reduction protocols have been loaded: 
 core.clj 
 line 894 
 https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L894.
  
 `reduce1` is implemented in terms of lazy sequences, with support for 
 chunking.

 So `apply max` defaults to using chunked lazy sequence operations. `map` 
 and `range` both return chunked sequences.

 `eduction` returns an Iterable, so when you `apply max` on it, it turns 
 the Iterable into a Seq, but it's not a chunked seq. Therefore, it's 
 slightly slower than `apply max` on a chunked seq.


 seqs on eductions *are* chunked - they will fall into this case during 
 seq: 
 https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L524-L525
  
 which produces a chunked sequence over an Iterable.
  

 In this case, to ensure you're using the fast-path internal reduce over `
 eduction`, you can use `reduce` directly:
 (reduce max 0 (eduction (map inc) (range 10)))
 You must provide an init value because `eduction` does not assume the 
 init with first element behavior of sequences.

 This version, in my informal benchmarking, is the fastest.

 Lots of functions in clojure.core use `reduce1` in their varargs 
 implementation. Perhaps they could be changed to use the optimized `
 reduce`, but this might add a lot of repeated definitions as 
 clojure.core is bootstrapping itself. I'm not sure.


 For various bootstrapping reasons, this is a hard change.
  

 In general, I would not assume that `eduction` is automatically faster 
 than lazy sequences. It will be faster only in the cases where it can use 
 the optimized reduction protocols such as InternalReduce. If the optimized 
 path isn't available, many operations will fall back to lazy sequences for 
 backwards-compatibility. 

 I would suggest using `eduction` only when you *know* you're going to 
 consume the result with `reduce` or `transduce`. As always, test first, and 
 profilers are your friend. :)


 Use eduction for delayed eager *non-cached* execution. Seqs give you 
 delayed *cached* execution. 
 If you're doing a transformation once, or if the thing you're doing would 
 consume too many resources if cached, then use eduction.
 If you need to do a transformation once and then use the result multiple 
 times, it's better to use sequence+transducer to get the caching effect and 
 the benefits of reduced allocation during transformation.

 Chunked seqs are surprisingly fast, particularly when all of the 
 operations in a nested transformation are chunked. However, every new layer 
 adds another set of (chunked) sequence allocation. Eduction or anything 
 transducer-based is going to do no seq allocation and execute as a single 
 eager pass. Generally this means that transducer stuff will win more if the 
 collection source is reducible, if the inputs are large (more input = 
 more win), or if the number of transformations is 1 (more transformations 
 = more wins).

  


 –S



 On Saturday, July 18, 2015 at 9:11:45 AM UTC-4, Leon Grapenthin wrote:

 My understanding was that if I pass an eduction to a process using 
 reduce, I can save the computer time and space because the per step 
 overhead of lazy sequences is gone and also the entire sequence does not 
 have to reside in memory at once.

 When I time the difference between (apply max (map inc (range 10))) 
 and (apply max (eduction (map inc) (range 10))), the lazy-seq variant 
 wins.

 I'd like to understand why, and when eductions should be used instead.



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more 

Re: #{:eduction :performance} Trying to understand when to use eduction

2015-07-19 Thread Stuart Sierra
Hi Leon,

I think this is an edge case related to how varargs functions are 
implemented in Clojure.

The varargs arity of `max` is implemented with `reduce1`: core.clj line 1088 
https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L1088

`reduce1` is a simplified implementation of reduce defined early in 
clojure.core before the optimized reduction protocols have been loaded: 
core.clj 
line 894 
https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L894.
 
`reduce1` is implemented in terms of lazy sequences, with support for 
chunking.

So `apply max` defaults to using chunked lazy sequence operations. `map` 
and `range` both return chunked sequences.

`eduction` returns an Iterable, so when you `apply max` on it, it turns the 
Iterable into a Seq, but it's not a chunked seq. Therefore, it's slightly 
slower than `apply max` on a chunked seq.

In this case, to ensure you're using the fast-path internal reduce over `
eduction`, you can use `reduce` directly:
(reduce max 0 (eduction (map inc) (range 10)))
You must provide an init value because `eduction` does not assume the init 
with first element behavior of sequences.

This version, in my informal benchmarking, is the fastest.

Lots of functions in clojure.core use `reduce1` in their varargs 
implementation. Perhaps they could be changed to use the optimized `reduce`, 
but this might add a lot of repeated definitions as clojure.core is 
bootstrapping itself. I'm not sure.

In general, I would not assume that `eduction` is automatically faster than 
lazy sequences. It will be faster only in the cases where it can use the 
optimized reduction protocols such as InternalReduce. If the optimized path 
isn't available, many operations will fall back to lazy sequences for 
backwards-compatibility. 

I would suggest using `eduction` only when you *know* you're going to 
consume the result with `reduce` or `transduce`. As always, test first, and 
profilers are your friend. :)

–S



On Saturday, July 18, 2015 at 9:11:45 AM UTC-4, Leon Grapenthin wrote:

 My understanding was that if I pass an eduction to a process using reduce, 
 I can save the computer time and space because the per step overhead of 
 lazy sequences is gone and also the entire sequence does not have to reside 
 in memory at once.

 When I time the difference between (apply max (map inc (range 10))) 
 and (apply max (eduction (map inc) (range 10))), the lazy-seq variant 
 wins.

 I'd like to understand why, and when eductions should be used instead.


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: #{:eduction :performance} Trying to understand when to use eduction

2015-07-19 Thread Alex Miller


On Sunday, July 19, 2015 at 10:53:25 AM UTC-5, Stuart Sierra wrote:

 Hi Leon,

 I think this is an edge case related to how varargs functions are 
 implemented in Clojure.

 The varargs arity of `max` is implemented with `reduce1`: core.clj line 
 1088 
 https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L1088

 `reduce1` is a simplified implementation of reduce defined early in 
 clojure.core before the optimized reduction protocols have been loaded: 
 core.clj 
 line 894 
 https://github.com/clojure/clojure/blob/36d665793b43f62cfd22354aced4c6892088abd6/src/clj/clojure/core.clj#L894.
  
 `reduce1` is implemented in terms of lazy sequences, with support for 
 chunking.

 So `apply max` defaults to using chunked lazy sequence operations. `map` 
 and `range` both return chunked sequences.

 `eduction` returns an Iterable, so when you `apply max` on it, it turns 
 the Iterable into a Seq, but it's not a chunked seq. Therefore, it's 
 slightly slower than `apply max` on a chunked seq.


seqs on eductions *are* chunked - they will fall into this case during 
seq: 
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/RT.java#L524-L525
 
which produces a chunked sequence over an Iterable.
 

 In this case, to ensure you're using the fast-path internal reduce over `
 eduction`, you can use `reduce` directly:
 (reduce max 0 (eduction (map inc) (range 10)))
 You must provide an init value because `eduction` does not assume the 
 init with first element behavior of sequences.

 This version, in my informal benchmarking, is the fastest.

 Lots of functions in clojure.core use `reduce1` in their varargs 
 implementation. Perhaps they could be changed to use the optimized `reduce`, 
 but this might add a lot of repeated definitions as clojure.core is 
 bootstrapping itself. I'm not sure.


For various bootstrapping reasons, this is a hard change.
 

 In general, I would not assume that `eduction` is automatically faster 
 than lazy sequences. It will be faster only in the cases where it can use 
 the optimized reduction protocols such as InternalReduce. If the optimized 
 path isn't available, many operations will fall back to lazy sequences for 
 backwards-compatibility. 

 I would suggest using `eduction` only when you *know* you're going to 
 consume the result with `reduce` or `transduce`. As always, test first, and 
 profilers are your friend. :)


Use eduction for delayed eager *non-cached* execution. Seqs give you 
delayed *cached* execution. 
If you're doing a transformation once, or if the thing you're doing would 
consume too many resources if cached, then use eduction.
If you need to do a transformation once and then use the result multiple 
times, it's better to use sequence+transducer to get the caching effect and 
the benefits of reduced allocation during transformation.

Chunked seqs are surprisingly fast, particularly when all of the operations 
in a nested transformation are chunked. However, every new layer adds 
another set of (chunked) sequence allocation. Eduction or anything 
transducer-based is going to do no seq allocation and execute as a single 
eager pass. Generally this means that transducer stuff will win more if the 
collection source is reducible, if the inputs are large (more input = 
more win), or if the number of transformations is 1 (more transformations 
= more wins).

 


 –S



 On Saturday, July 18, 2015 at 9:11:45 AM UTC-4, Leon Grapenthin wrote:

 My understanding was that if I pass an eduction to a process using 
 reduce, I can save the computer time and space because the per step 
 overhead of lazy sequences is gone and also the entire sequence does not 
 have to reside in memory at once.

 When I time the difference between (apply max (map inc (range 10))) 
 and (apply max (eduction (map inc) (range 10))), the lazy-seq variant 
 wins.

 I'd like to understand why, and when eductions should be used instead.



-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


#{:eduction :performance} Trying to understand when to use eduction

2015-07-18 Thread Leon Grapenthin
My understanding was that if I pass an eduction to a process using reduce, 
I can save the computer time and space because the per step overhead of 
lazy sequences is gone and also the entire sequence does not have to reside 
in memory at once.

When I time the difference between (apply max (map inc (range 10))) and 
(apply max (eduction (map inc) (range 10))), the lazy-seq variant wins.

I'd like to understand why, and when eductions should be used instead.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.