----- Original Message ----- > From: "Lai Jiangshan" <[email protected]> > To: "Mathieu Desnoyers" <[email protected]> > Cc: "Ben Maurer" <[email protected]>, "lttng-dev" <[email protected]>, > "Yannick Brosseau" <[email protected]>, > "Paul E. McKenney" <[email protected]>, "Stephen Hemminger" > <[email protected]> > Sent: Thursday, October 23, 2014 12:26:58 AM > Subject: Re: Userspace RCU: workqueue with batching, cheap wakeup, and work > stealing > > Hi, Mathieu
Hi Lai, > > In the code, how this problem be solved? > "All the tasks may continue stealing works without any progress." Good point. It is not. I see two possible solutions for this: 1) We break the circular chain of siblings at the list head. It has the downside that work could accumulate at the first worker of the list, or 2) Whenever we grab work (either from the global queue or from a sibling), we could dequeue one element and keep it to ourself, out of reach of siblings. This would ensure progress, since worker stealing work would always keep one work element to itself, which would therefore ensure progress. I personally prefer (2), since it has fewer downsides. > > Why ___urcu_steal_work() needs cds_wfcq_dequeue_lock()? This lock is to protect operations on the sibling workqueue. Here are the various actors touching it: - The worker attempting to steal from the sibling (with splice-from), - The sibling itself, in urcu_dequeue_work(), with a dequeue operation, Looking at the wfcq synchronization table: * Synchronization table: * * External synchronization techniques described in the API below is * required between pairs marked with "X". No external synchronization * required between pairs marked with "-". * * Legend: * [1] cds_wfcq_enqueue * [2] __cds_wfcq_splice (destination queue) * [3] __cds_wfcq_dequeue * [4] __cds_wfcq_splice (source queue) * [5] __cds_wfcq_first * [6] __cds_wfcq_next * * [1] [2] [3] [4] [5] [6] * [1] - - - - - - * [2] - - - - - - * [3] - - X X X X * [4] - - X - X X * [5] - - X X - - * [6] - - X X - - We need a mutex between splice-from and dequeue since they are executed by different threads. I can replace this with a call to cds_wfcq_splice_blocking() instead, which takes the lock internally. That would be neater. > > It is not a good idea for a worker to call synchronize_rcu() (in > urcu_accept_work()). > A new work may be coming soon. The reason why the worker calls synchronize_rcu() there is to protect the waitqueue lfstack "pop" operation from ABA. It matches the RCU read-side critical sections encompassing urcu_dequeue_wake_single(). Another way to achieve this would be to grab a mutex across calls to urcu_dequeue_wake_single(), but unfortunately this call needs to be done by the dispatcher thread, and I'm trying to keep a low-latency bias in favor of the dispatcher here. In this case, it is done at the expense of overhead in the worker when the worker is about to busy-wait and eventually block, so I thought the tradeoff was not so bad. Thoughts ? Thanks for the feedback! Mathieu > > thanks, > Lai > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
