Re: [lock-free] Branch prediction units and implications to multi threading

Dan Koren Fri, 27 Feb 2015 02:40:40 -0800

All the CPU logic is per core.
Thx,
dk

On Wed, Feb 18, 2015 at 11:44 PM, Rajiv Kurian <[email protected]> wrote:


> I have been trying to find out how many branch prediction units there are
> in a typical Haswell, but so far haven't gotten a definitive answer. Is
> there a single unit per core or per socket or per processor? I have been
> pondering on the implications of this on how to architect threaded
> applications. For example I have an application that has worker threads
> that do something like this:
>
> // A single worker thread.
> while (true) {
>   waitForIncomingEventsOnRingBuffer();
>   for each event do {
>     foo(event);
>     bar(event);
>     baz(event);
>   }
> }
>
> A work producer produces events and round robins between each worker
> thread which completely processes the event. So each event is processed
> only by a single thread. One reason for doing things this way is once an
> event is in cache, it makes sense to process it all the way through.
>
> Now imagine that each function has quite a few branches. If the branch
> prediction unit becomes a bottleneck then we might take quite a few
> pipeline flushes from bad predictions. Alternatively we could:
>
> i) Have the master thread put all events on a single buffer.
> ii) Have each worker thread advance process every event on this buffer but
> only call one function (foo, bar or baz). Assuming that there are no
> ordering dependencies between the functions this should be fine. For
> example imagine foo is persisting to disk, bar is calculating some
> statistics and baz is doing the actual processing. The producer only
> considers a slot empty if all three consumers have incremented their
> respective cursors beyond said slot.
>
> Now in this scenario if there was indeed a branch prediction unit per core
> and assuming each thread maps onto a core, we could possibly get better
> branch prediction since there is less code being executed on said thread.
> There is a secondary benefit too in that each function will probably bring
> in some cache lines, so if we split them across threads they will each have
> their own L1 and L2. On the other hand if we execute each function on every
> thread then they cause more cache churn.
>
> All of this is conjecture of course and real measurement would be the best
> way to know. I just wanted to get an understanding of the kind of reasons
> people use to decide when to go for
>
> i) A pipelined architecture(one stage = specific function) VS a parallel
> one (parallel stages each doing all function). Better branch prediction
> might be one of the concerns.
> ii) In a pipelined architecture how does one decide where one pipeline
> begins and another ends i.e. how to break up the processing of a single
> unit of work across stages/threads.
>
> Thanks!
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Scalable Synchronization Algorithms" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/lock-free/fe4c6bfb-4a66-4075-be68-a5fd77eff359%40googlegroups.com
> <https://groups.google.com/d/msgid/lock-free/fe4c6bfb-4a66-4075-be68-a5fd77eff359%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 

dk

 ------------------------------

This email and any attachments thereto may contain private,
confidential, and privileged material for the sole use of the
intended recipient. Any review, copying, or distribution of
this email (or any attachments thereto) by others is strictly
prohibited. If you are not the intended recipient, please
contact the sender immediately and permanently delete the
original and any copies of this email and any attachments
thereto.
 ------------------------------

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/CAEw6O1o_UbD2e6ho%2B9W7MyKEoGzbbSHmbRJQ5biR0ty4-BxG0Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [lock-free] Branch prediction units and implications to multi threading

Reply via email to