On Fri, Jan 24, 2025 at 04:58:03PM -0500, Steven Rostedt wrote: > On Thu, 23 Jan 2025 23:13:26 +0100 > Peter Zijlstra <pet...@infradead.org> wrote: > > > -EPONIES, you cannot take faults from the middle of schedule(). They can > > always use the best effort FP unwind we have today. > > Agreed. > > Now the only thing I could think of is a flag gets set where the task comes > out of the scheduler and then does the stack trace. It doesn't need to do > the stack trace before it schedules. As it did just schedule, where ever it > scheduled must have been in a schedulable context. > > That is, kind of like the task_work flag for entering user space and > exiting the kernel, could we have a sched_work flag to run after after being > scheduled back (exiting schedule()). Since the task has been picked to run, > it will not cause latency for other tasks. The work will be done in its > context. This is no different to the tasks accounting than if it does this > going back to user space. Heck, it would only need to do this once if it > didn't go back to user space, as the user space stack would be the same. > That is, if it gets scheduled multiple times, this would only happen on the > first instance until it leaves the kernel. > > > [ trigger stack trace - set sched_work ] > > schedule() { > context_switch() -> CPU runs some other task > <- gets scheduled back onto the CPU > [..] > /* preemption enabled ... */ > if (sched_work) { > do stack trace() // can schedule here but > // calls a schedule function that does > not > // do sched_work to prevent recursion > } > } > > Could something like this work?
Yeah, this is basically a more fleshed out version of what I was trying to propose. One additional wrinkle is that if @prev wakes up on another CPU while @next is unwinding it, the unwind goes haywire. So that would maybe need to be prevented. -- Josh