My use case is the following:

1) A single thread reads bytes of the network and produces work.
2) Multiple worker threads actually complete the work. There are only a few 
types of work (say at most 20) but each one takes a variable amount of time.

I first tried using multiple SPSC ring buffers (connecting the single 
producer to a worker thread) to distribute the work. The producer would 
just round robin between the SPSC ring buffers and enqueue some work to 
each worker. This avoids contention since the producer thread and each 
worker thread is connected by a different queue. It ends up causing 
imbalanced queues though. Some workers are starved of work while others 
have too much. This especially applies to my work load where some work 
items take a few seconds while others take a few 100 micro seconds.

I thought of a few other options:

1) Use a SPMC ring buffer. The workers just compete to get a slot on the 
ring buffer and then do their work. The nice thing is that they now are 
never idle if there is work to be done. The bad thing is contention on the 
queue. This is the simplest option though.

2) Use some kind of work requesting. I just saw the work requesting 
design<http://www.1024cores.net/home/parallel-computing/concurrent-skip-list/scheduler-algorithm>on
 Dmitry's blog. This seems like a nice pattern especially since, if most 
workers are busy there is no contention. The problem though is if a worker 
is stuck working on a large request (one that takes a few seconds) it won't 
be able to hand off work to an idle worker.

3) Build a cost model of requests. For my application this doesn't seem 
like a bad thing. Since I have only a few types of requests (20 or so)  I 
could run a tuning phase at start up that builds a cost model for each 
input task. Let's say for an image processing application the types of work 
are filters like blur, sharpen, resize etc. We could run a phase on startup 
where we check how much time each of the filter takes with image of 
different sizes (user supplied maybe). Once we have this info down, we 
could build a cost model which estimates how much time say a blur of an 
image of size x px * y px would take. Now we could go back to the multiple 
SPSC queues design and instead of round robin we just assign work to the 
lightest loaded worker based on our cost model. So in essence we are just 
trying to ensure that the workers never get starved or over-worked at the 
source of scheduling instead of after the fact.

Any comments or other methods would be highly appreciated.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/9d71e72a-51f1-491d-adfc-31a367ce23d7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to