On Fri, Jan 10, 2014 at 3:25 AM, Rajiv Kurian <[email protected]> wrote: > > > On Thursday, January 9, 2014 6:28:21 AM UTC-8, Dmitry Vyukov wrote: >> >> On Mon, Dec 23, 2013 at 11:55 AM, Rajiv Kurian <[email protected]> wrote: >> > My use case is the following: >> > >> > 1) A single thread reads bytes of the network and produces work. >> > 2) Multiple worker threads actually complete the work. There are only a >> > few >> > types of work (say at most 20) but each one takes a variable amount of >> > time. >> > >> > I first tried using multiple SPSC ring buffers (connecting the single >> > producer to a worker thread) to distribute the work. The producer would >> > just >> > round robin between the SPSC ring buffers and enqueue some work to each >> > worker. This avoids contention since the producer thread and each worker >> > thread is connected by a different queue. It ends up causing imbalanced >> > queues though. Some workers are starved of work while others have too >> > much. >> > This especially applies to my work load where some work items take a few >> > seconds while others take a few 100 micro seconds. >> >> Hi! >> >> Yes, this definitely does not look like a good solution. >> >> >> > I thought of a few other options: >> > >> > 1) Use a SPMC ring buffer. The workers just compete to get a slot on the >> > ring buffer and then do their work. The nice thing is that they now are >> > never idle if there is work to be done. The bad thing is contention on >> > the >> > queue. This is the simplest option though. >> > >> > 2) Use some kind of work requesting. I just saw the work requesting >> > design >> > on Dmitry's blog. This seems like a nice pattern especially since, if >> > most >> > workers are busy there is no contention. The problem though is if a >> > worker >> > is stuck working on a large request (one that takes a few seconds) it >> > won't >> > be able to hand off work to an idle worker. >> > >> > 3) Build a cost model of requests. For my application this doesn't seem >> > like >> > a bad thing. Since I have only a few types of requests (20 or so) I >> > could >> > run a tuning phase at start up that builds a cost model for each input >> > task. >> > Let's say for an image processing application the types of work are >> > filters >> > like blur, sharpen, resize etc. We could run a phase on startup where we >> > check how much time each of the filter takes with image of different >> > sizes >> > (user supplied maybe). Once we have this info down, we could build a >> > cost >> > model which estimates how much time say a blur of an image of size x px >> > * y >> > px would take. Now we could go back to the multiple SPSC queues design >> > and >> > instead of round robin we just assign work to the lightest loaded worker >> > based on our cost model. So in essence we are just trying to ensure that >> > the >> > workers never get starved or over-worked at the source of scheduling >> > instead >> > of after the fact. >> >> >> There is also "classical" work stealing. You can have N SPMC queues, >> the produces queues in round-robin, each thread takes one element from >> own queue and processes, if it's out of work it tries to steal a half >> of elements from other queues. >> >> >> However, I would consider using an efficient SPMC queue as well, >> because it will be much simpler. >> You can take this queue: >> >> http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue >> and remove the CAS from enqueue because you have only 1 producer. >> >> Or even you can take this Chris' modification as base: >> >> https://groups.google.com/forum/#!searchin/lock-free/thomasson/lock-free/acjQ3-89abE/a6-Di0GZsyEJ >> >> Since you have a single queue in the program, you can afford to pad >> each element to cache line size. This should give you very decent >> contention. > > Thank you Dmitry. I think I will use a SPMC queue. The application is for > image processing and processing each image takes longer than any cache > contention during deque.The automatic load balancing with a SPMC queue just > makes life easier than first distributing the work and then trying to do > work stealing.
Makes sense. -- --- You received this message because you are subscribed to the Google Groups "Scalable Synchronization Algorithms" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/lock-free/CAEeQi3ucGhk088xp_jyGpK8nEUPHyG8h8Zs%2BeyN7gM0zHYpS6Q%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
