Re: parallel sequence side-effect processor

Francis Avila Sat, 24 Sep 2016 07:54:36 -0700

Well, you pay a cost whenever you use seqs instead of reduce, so it/s 
strange to think of doseq as "as fast as possible". If it/s argument is a 
true collection, its IReduce is usually faster than its seq. If it is a 
sequence already, its IReduce is usually just using seqs anyway.

Let's get some rough numbers here (which is what you should do on a 
case-by-case basis):

(require  '[uncomplicate.fluokitten.core :as f])
=> nil
(def side-effect (constantly nil))
=> #'user/side-effect
(def col1 (range 10000))
=> #'user/col1
(def col2 (range 10000))
=> #'user/col2
(def col2 (vec (range 10000)))
=> #'user/col2
(def col1 (vec (range 10000)))
=> #'user/col1

Single-coll:

(time (dotimes [_ 10] (doseq [x col1] (side-effect x))))
"Elapsed time: 13.916077 msecs"
=> nil
(time (dotimes [_ 10] (dorun (map side-effect col1))))
"Elapsed time: 5.707441 msecs"
=> nil
(time (dotimes [_ 10] (run! side-effect col1)))
"Elapsed time: 1.190621 msecs"
=> nil

Notice there seems to be some overhead to doseq that dorun doesn't have, so 
actually map+dorun is faster (for this particular coll). And run! is the 
fastest because a vector's IReduce is significantly faster than its seq.

Multi-coll:

(time (dorun (map side-effect col1 col2)))
"Elapsed time: 13.194375 msecs"
=> nil
(time (run! side-effect (map vector col1 col2)))
"Elapsed time: 17.54224 msecs"
=> nil
(time (run! side-effect (mapv vector col1 col2)))
"Elapsed time: 3.892984 msecs"
=> nil
(time (f/foldmap f/op nil side-effect col1 col2))
"Elapsed time: 12.454673 msecs"
=> nil
(time (dorun (sequence (map side-effect) col1 col2)))
"Elapsed time: 31.321698 msecs"
=> nil

Interesting results.

So dorun+map is still pretty good. foldmap wins out a little bit, but is 
still clearly using seqs underneath.

run! over a seq is not better than just consuming the seq. This is because 
the IReduce of a seq uses first/next internally, so there is no IReduce win.

However, run! over a vector is faster (for this collection, at this size) *even 
though we are creating a bunch more intermediate collections!*. The IReduce 
of a vector is really *that much faster* than its seq! So if you really 
need speed and don't care about laziness or memory, it may be faster just 
to make the intermediate collections-of-collections and reduce over it. 
Benchmark your particular hotspot!

And even though sequence uses iterators internally, there is clearly still 
some extra overhead involved.

On Friday, September 23, 2016 at 10:58:42 PM UTC-5, Mars0i wrote:
>
> Thanks very much Francis.  
>
> So it sounds as if, if my data would already be in a vector, list or 
> sequence (maybe a lazy sequence), doseq will process that structure as fast 
> as possible, but there are other structures that might have faster internal 
> reduce operations.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: parallel sequence side-effect processor

Reply via email to