Interesting point about bufferbloat! I've just pushed the "approximate queue upper bound" feature in the last commit.
Going further, when the queue is full, there are indeed a few options: 1) sleep for a few ms and retry enqueue, 2) grab the entire content of the global workqueue, and discard its work elements one by one, 3) in addition to (2), also steal work from all worker threads, and discard their work elements. Making the dispatcher act as a dummy "worker thread" would allow it to easily accomplish (2). We'd need some tweaks to "steal all worker's work elements" (3) (new API). This could be presented as a "urcu_queue_steal_all" or something like that, and then the dispatcher could iterate on the work items and either discard them, or perform the appropriate socket action. Thoughts ? Thanks, Mathieu ----- Original Message ----- > From: "Ben Maurer" <[email protected]> > To: "Mathieu Desnoyers" <[email protected]>, "Lai Jiangshan" > <[email protected]> > Cc: "lttng-dev" <[email protected]>, "Paul E. McKenney" > <[email protected]>, "Yannick Brosseau" > <[email protected]> > Sent: Thursday, October 23, 2014 6:09:11 PM > Subject: RE: [lttng-dev] Userspace RCU: workqueue with batching, cheap > wakeup, and work stealing > > Bounds are pretty critical :-), often during operational incidents we will > get large buildups in our queues and these cause problems. > > For us, one of the most critical things isn't the memory usage but the delay > caused to the client. For example, if a server has a queue that incoming > requests are put into if that queue grows large clients experience large > delays. Since most calls to the server have a short timeout (seconds), we'd > rather prevent items from entering the queue so that we fail fast. > > Some of our applications switch to LIFO processing of work items when the > queue is large. What this does is to focus the processing effort on recent > requests -- ones which will hopefully get back to the user in time for them > to see a response. > > Long story short: when a queue is overloaded, we'd rather drop some requests > quickly and serve the other requests with minimal queuing delay. Think of > queues as bufferbloat applied to work items. In fact, we have experimented > with some of the bufferbloat techniques on our work queues (specifically, > CoDEL) > > -b > ________________________________________ > From: Mathieu Desnoyers [[email protected]] > Sent: Thursday, October 23, 2014 2:57 PM > To: Lai Jiangshan > Cc: lttng-dev; Paul E. McKenney; Ben Maurer; Yannick Brosseau > Subject: Re: [lttng-dev] Userspace RCU: workqueue with batching, cheap > wakeup, and work stealing > > The next thing I'm wondering now: should we include an > optional bound to the global workqueue size in the API ? > > I've just had cases here where I stress test the queue > with very frequent dispatch, and it can fill up memory > relatively quickly if the workers have a large amount of > work to do per work-item. > > I think the usual way to do this would be to make the > behavior nonblocking when the queue is full, so the > dispatcher can take action and move the work away to > another machine, or report congestion. > > Thoughts ? > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
