Re: abysmal multicore performance, especially on AMD processors

Marshall Bockrath-Vandegrift Tue, 11 Dec 2012 08:41:06 -0800

Lee Spector <lspec...@hampshire.edu> writes:

> Is the following a fair characterization pending further developments?
>
> If you have a cons-intensive task then even if it can be divided into
> completely independent, long-running subtasks, there is currently no
> known way to get significant speedups by running the subtasks on
> multiple cores within a single Clojure process.


Not quite.  If you’d been using `cons` (in the benchmark, if `reverse`
used `cons` in its implementation), then you’d be getting a perfectly
reasonable speedup.  The problem child in this instance is `conj`.

If my analysis is correct, then the issue is any megamodal call site –
such as `conj` – which is invoked in a tight loop by multiple threads
simultaneously.  Any simultaneous invocation of such call sites
introduces contention and reduces speedup, but the problem only becomes
pathological in very, very tight loops, such as when performing the
minimal work required by the `.cons` [1] implementations of `Cons` and
`PersistentList`.  In these cases the portion of the call which
introduces contention is a sufficient proportion of the overall call
time that the speedup becomes inverse.

> In some cases you will be able to get significant speedups by
> separating the subtasks completely and running them in separate
> Clojure processes running on separate JVM instances.  But the speedups
> will be lost (mostly, and you might even experience slowdowns) if you
> try to run them from within a single Clojure process.

For this particular issue, splitting each task into a separate JVM
entirely negates the problem, because there is no simultaneous
invocation of the same call site.

> Or have I missed a currently-available work-around among the many
> suggestions?

You can specialize your application to avoid megamodal call sites in
tight loops.  If you are working with `Cons`-order sequences, just use
`cons` instead of `conj`.  If you are working with vectors, create your
own private implementation of `conj` which you *only* call on vectors.
If you are depending on operations which may/do use `conj` in tight
loops, create your own private re-implementations which don’t, such as
with any of the faster versions of `reverse` earlier in this thread.

This is suboptimal, but it’s totally possible to work around the issue
with a little bit of analysis and profiling.

[1] Possible point of confusion – the JVM interface method invoked by
the Clojure `conj` function is named `.cons`, for I assume historical
reasons.  The Clojure `cons` function on the other hand just allocates a
`Cons` object in an entirely monomodal fashion.

-Marshall

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: abysmal multicore performance, especially on AMD processors

Reply via email to