Re: Poor parallelization performance across 18 cores (but not 4)

David Iba Tue, 17 Nov 2015 02:01:10 -0800

correction: that "do" should be a "doall".  (My actual test code was a bit 
different, but each run printed some info when it started so it doesn't 
have to do with delayed evaluation of lazy seq's or anything).


On Tuesday, November 17, 2015 at 6:49:16 PM UTC+9, David Iba wrote:
>
> Andy:  Interesting.  Thanks for educating me on the fact that atom swap's 
> don't use the STM.  Your theory seems plausible... I will try those tests 
> next time I launch the 18-core instance, but yeah, not sure how 
> illuminating the results will be.
>
> Niels: along the lines of this (so that each thread prints its time as 
> well as printing the overall time):
>
>    1.   (time
>    2.    (let [f f1
>    3.          n-runs 18
>    4.          futs (do (for [i (range n-runs)]
>    5.                     (future (time (f)))))]
>    6.      (doseq [fut futs]
>    7.        @fut)))
>    
>
> On Tuesday, November 17, 2015 at 5:33:01 PM UTC+9, Niels van Klaveren 
> wrote:
>>
>> Could you also show how you are running these functions in parallel and 
>> time them ? The way you start the functions can have as much impact as the 
>> functions themselves.
>>
>> Regards,
>> Niels
>>
>> On Tuesday, November 17, 2015 at 6:38:39 AM UTC+1, David Iba wrote:
>>>
>>> I have functions f1 and f2 below, and let's say they run in T1 and T2 
>>> amount of time when running a single instance/thread.  The issue I'm facing 
>>> is that parallelizing f2 across 18 cores takes anywhere from 2-5X T2, and 
>>> for more complex funcs takes absurdly long.
>>>
>>>
>>>    1. (defn f1 []
>>>    2.   (apply + (range 2e9)))
>>>    3.  
>>>    4. ;; Note: each call to (f2) makes its own x* atom, so the 'swap!' 
>>>    should never retry.
>>>    5. (defn f2 []
>>>    6.   (let [x* (atom {})]
>>>    7.     (loop [i 1e9]
>>>    8.       (when-not (zero? i)
>>>    9.         (swap! x* assoc :k i)
>>>    10.         (recur (dec i))))))
>>>    
>>>
>>> Of note:
>>> - On a 4-core machine, both f1 and f2 parallelize well (roungly T1 and 
>>> T2 for 4 runs in parallel)
>>> - running 18 f1's in parallel on the 18-core machine also parallelizes 
>>> well.
>>> - Disabling hyperthreading doesn't help.
>>> - Based on jvisualvm monitoring, doesn't seem to be GC-related
>>> - also tried on dedicated 18-core ec2 instance with same issues, so not 
>>> shared-tenancy-related
>>> - if I make a jar that runs a single f2 and launch 18 in parallel, it 
>>> parallelizes well (so I don't think it's machine/aws-related)
>>>
>>> Could it be that the 18 f2's in parallel on a single JVM instance is 
>>> overworking the STM with all the swap's?  Any other theories?
>>>
>>> Thanks!
>>>
>>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Poor parallelization performance across 18 cores (but not 4)

Reply via email to