Hello, guys. On Mon, Dec 02, 2013 at 04:37:46PM +0100, Ingo Molnar wrote: > > - We should strive to never consciously place artificial limitations on > > kernel functionality; our main reason to place limitations should be > > correctness.
It is a correctness issue tho. We'd be promising, say, NUMA affinity and then possibly give out workers which aren't affine to the NUMA node. Workqueue provides an API to set random affinity and its users can rightfully expect the configured affinity and code accordingly. > > There are cases where limiting affinity is justified: for example the > > case of single cpu workqueue threads, which are special for their > > limited concurrency, esp. when coupled with per-cpu resources -- > > allowing such threads to run on other cpus is a correctness violation > > and can crash the kernel. So can any requested affinity if the wq user configured it. I think you're too focused on the binary bound / unbound distinction. Things are a lot more generic now. > > - But using it outside this case is overly broad; it dis-allows usage > > that is functionally fine and in some cases desired. > > > > In particular; tj argues ( > > http://lkml.kernel.org/r/[email protected] ) > > > > "That's just inviting people to do weirdest things and then > > reporting things like "crypt jobs on some of our 500 machines end up > > stuck on a single cpu once in a while" which will eventually be > > tracked down to some weird shell script setting affinity on workers > > doing something else." > > > > While that is exactly what HPC/RT people _want_ in order to avoid > > disturbing the other CPUs with said crypt work. Hmmm... I think you're missing the point there. The point is that you can't really do that reliably by meddling with kworkers from userland directly. It's gonna be fragile and dangerous. Userland would have to continously poll for new kworkers while kernel workqueue users would get their affinity expectations broken. I don't think that's what HPC/RT people want. If HPC/RT people want to limit the cpus that unbound workqueues without specific affinity configured can run on, let's please implement that properly. > > - Furthermore, the above example is also wrong in that you should not > > protect root from itself; there's plenty root can do to shoot his > > own foot off, let alone shoot his head off. > > > > Setting affinities is limited to root, and if root messes up the > > system he can keep the pieces. But limiting in an overly broad > > fashion stifles the functionality of the system. This is the same problem with bound workers. Userland could be actively breaking configured affinities of kworkers which may be depended upon by its users and there's no way for userland to tell which kworker is gonna serve which workqueue - there's no fixed relationship between them, so it's not like userland can, say, modify affinities on crypto kworkers in isolation. Both userland and kernel wouldn't have much idea of the impact of such fiddling. > > - Lastly; the flag actually blocks entry into cgroups as well asn > > sched_setaffinity(), so the name is misleading at best. Yeah, the name could be better. > > The right fix is to only set PF_THREAD_BOUND on per-cpu workers: > > > > --- a/kernel/workqueue.c > > +++ b/kernel/workqueue.c > > > > set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); > > > > - /* prevent userland from meddling with cpumask of workqueue > > workers */ > > - worker->task->flags |= PF_NO_SETAFFINITY; > > - > > /* > > * The caller is responsible for ensuring %POOL_DISASSOCIATED > > * remains stable across this function. See the comments above the > > * flag definition for details. > > */ > > - if (pool->flags & POOL_DISASSOCIATED) > > + if (pool->flags & POOL_DISASSOCIATED) { > > worker->flags |= WORKER_UNBOUND; > > + } else { > > + /* > > + * Prevent userland from meddling with cpumask of workqueue > > + * workers: > > + */ > > + worker->task->flags |= PF_THREAD_BOUND; > > + } > > > > Therefore revert the patch and add an annoying but non-destructive warning > > check against abuse. > > Hm, I missed these problems with the approach, but I think you are > right. > > Tejun, I suspect you concur with Peter's analysis, can I also add So, not really. > Peter's workqueue.c fixlet above to workqueue.c to this patch plus > your Acked-by, so that the two changes are together? The above would trigger the added warning if a new kworker is created for a per-cpu workqueue while the cpu is down, which may happen, so the patch itself is somewhat broken. I don't think this is the right path to accomodate HPC/RT use cases. Let's please implement something proper if keeping unbound kworkers off certain cpus is necessary. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

