----- Original Message ----- > From: "Mathieu Desnoyers" <[email protected]> > To: "Lai Jiangshan" <[email protected]> > Cc: "lttng-dev" <[email protected]>, "Paul E. McKenney" > <[email protected]>, "Ben Maurer" > <[email protected]>, "Yannick Brosseau" <[email protected]> > Sent: Thursday, October 23, 2014 9:30:12 AM > Subject: Re: [lttng-dev] Userspace RCU: workqueue with batching, cheap > wakeup, and work stealing > > ----- Original Message ----- > > From: "Lai Jiangshan" <[email protected]> > > To: "Mathieu Desnoyers" <[email protected]> > > Cc: "Paul E. McKenney" <[email protected]>, "lttng-dev" > > <[email protected]>, "Ben Maurer" > > <[email protected]>, "Yannick Brosseau" <[email protected]> > > Sent: Thursday, October 23, 2014 5:28:24 AM > > Subject: Re: [lttng-dev] Userspace RCU: workqueue with batching, cheap > > wakeup, and work stealing > > > > Hi Mathieu: > > > > what happen when the wait_node is still queued when > > urcu_worker_unregister() > > is called? > > Hrm, I think this could indeed be an issue. I'm thinking to add a "shutdown" > mode to the workqueue (similarly to folly lifosem). Perhaps this could help > us handling this situation. > > > > > On 10/23/2014 12:26 PM, Lai Jiangshan wrote: > [...] > > > > > > Hi, Mathieu > > > > > > In the code, how this problem be solved? > > > "All the tasks may continue stealing works without any progress." > > > > Can __urcu_wakeup_siblings() be moved to the end of the > > urcu_dequeue_work()? > > It might help, but I don't think it would solve the theoretical > lack-of-progress issue. If we are in a state where all workers threads > are running, none need to be awakened to steal work, and they could fall > into an endless cycle of work-stealing followed by facing an empty work > queue due to another concurrent stealing. I think getting threads to keep > one work item for themselves when they grab work from the global queue > or steal work offers better forward progress guarantees in this case. > > > > > > > > > Why ___urcu_steal_work() needs cds_wfcq_dequeue_lock()? > > Sorry, it does need, it locks the source queue... > > but we have cds_wfcq_splice_blocking() which wrap the locking and the > > splicing. > > Yes, > > > > > > > > > It is not a good idea for a worker to call synchronize_rcu() (in > > > urcu_accept_work()). > > > A new work may be coming soon. > > > > > > I suggest to add a new API: > > __cds_lfs_pop_if(stack, expect) { > > atomic { > > if the top == expect { > > pop it; > > } > > } > > } > > > > Then all the worker can only be woken up by urcu_adaptative_wake_up(), > > rather than urcu_dequeue_wake_single(). > > > > workers dequeues itself when successfully urcu_accept_work() by using > > __cds_lfs_pop_if(). > > There is something I don't like about this approach: what if, for some > reason, a worker being awakened takes a lot of time to start running. > It would therefore stall the entire waitqueue until it can be executed > and pop itself from the waitqueue.
All the above comments should be taken care of by this string of commits: commit a649215929e3af5a0dd26d9fc5bb9ed49e677773 Author: Mathieu Desnoyers <[email protected]> Date: Thu Oct 23 12:19:31 2014 -0400 workqueue: keep one work item from stealing co-workers Takes care of lack of progress issue that could theoretically occur due to a steady work-stealing circular loop amongst co-workers. Keeping a work item hidden from co-workers ensures forward progress. Signed-off-by: Mathieu Desnoyers <[email protected]> commit 6e17009c7e4276b3a90aada07e7c9835e9a788be Author: Mathieu Desnoyers <[email protected]> Date: Thu Oct 23 11:27:58 2014 -0400 workqueue: ensure worker is removed from waitqueue upon unregister Signed-off-by: Mathieu Desnoyers <[email protected]> commit 1b0a9891cd2f9100dc87745d77a4d0069a21adb7 Author: Mathieu Desnoyers <[email protected]> Date: Thu Oct 23 11:13:24 2014 -0400 workqueue: implement shutdown Signed-off-by: Mathieu Desnoyers <[email protected]> commit 0695bd205e8aa88b3b032f483b7dc98c16450784 Author: Mathieu Desnoyers <[email protected]> Date: Thu Oct 23 09:01:29 2014 -0400 workqueue: use cds_wfcq_splice_blocking() rather than explicit locks cds_wfcq_splice_blocking() internally takes the locks. Signed-off-by: Mathieu Desnoyers <[email protected]> Please let me know if I missed anything, Thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
