On 07/29/2015 08:44 PM, Frederic Weisbecker wrote:
On Wed, Jul 29, 2015 at 01:24:16PM -0400, Chris Metcalf wrote:
On 07/29/2015 09:23 AM, Frederic Weisbecker wrote:
At a higher level, is the posix-cpu-timers code here really providing the
right semantics? It seems like before, the code was checking a struct
task-specific state, and now you are setting a global state such that if ANY
task anywhere in the system (even on housekeeping cores) has a pending posix
cpu timer, then nothing can go into nohz_full mode.
Perhaps what is needed is a task_struct->tick_dependency to go along with
the system-wide and per-cpu flag words?
That's an excellent point! Indeed the tick dependency check on posix-cpu-timers
was made on task granularity before and now it's a global dependency.
Which means that if any task in the system has a posix-cpu-timer enqueued, it
prevents all CPUs from shutting down the tick. I need to mention that in the
changelog.
Now here is the rationale: I expect that nohz full users are not interested in
posix cpu timers at all. The only chance for one to run without breaking the
isolation is on housekeeping CPUs. So perhaps there is a corner case somewhere
but I assume there isn't until somebody reports an issue.
Keeping a task level dependency check means that we need to update it on context
switch. Plus it's not only about task but also process. So that means two
states to update on context switch and to check from interrupts. I don't think
it's worth the effort if there is no user at all.
I really worry about this! The vision EZchip offers our customers is
that they can run whatever they want on the slow path housekeeping
cores, i.e. random control-plane code. Then, on the fast-path cores,
they run their nohz_full stuff without interruption. Often they don't
even know what the hell is running on their control plane cores - SNMP
or random third-party crap or god knows what. And there is a decent
likelihood that some posix cpu timer code might sneak in.
I see. But note that installing a posix cpu timer ends up triggering an
IPI to all nohz full CPUs. That's how nohz full has always behaved.
So users running posix timers on nohz should already suffer issues anyway.
True now, yes, I'm just looking ahead to doing better when we have
a chance to improve things.
You mentioned needing two fields, for task and for process, but in
fact let's just add the one field to the one thing that needs it and
not worry about additional possible future needs. And note that it's
the task_struct->signal where we need to add the field for posix cpu
timers (the signal_struct) since that's where the sharing occurs, and
given CLONE_SIGHAND I imagine it could be different from the general
"process" model anyway.
Well, posix cpu timers can be install per process (signal struct) or
per thread (task struct).
But we can certainly simplify that with a per process flag and expand
the thread dependency to the process scope.
Still there is the issue of telling the CPUs where a process runs when
a posix timer is installed there. There is no process-like tsk->cpus_allowed.
Either we send an IPI everywhere like we do now or we iterate through all
threads in the process to OR all their cpumasks in order to send that IPI.
Is there a reason the actual timer can't run on a housekeeping
core? Then when it does wake_up_process() or whatever, the
specific target task will get an IPI to wake up at that point.
--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/