On Fri, Jan 24, 2025 at 04:58:03PM -0500, Steven Rostedt wrote:
> On Thu, 23 Jan 2025 23:13:26 +0100
> Peter Zijlstra <pet...@infradead.org> wrote:
> 
> > -EPONIES, you cannot take faults from the middle of schedule(). They can
> > always use the best effort FP unwind we have today.
> 
> Agreed.
> 
> Now the only thing I could think of is a flag gets set where the task comes
> out of the scheduler and then does the stack trace. It doesn't need to do
> the stack trace before it schedules. As it did just schedule, where ever it
> scheduled must have been in a schedulable context.
> 
> That is, kind of like the task_work flag for entering user space and
> exiting the kernel, could we have a sched_work flag to run after after being
> scheduled back (exiting schedule()). Since the task has been picked to run,
> it will not cause latency for other tasks. The work will be done in its
> context. This is no different to the tasks accounting than if it does this
> going back to user space. Heck, it would only need to do this once if it
> didn't go back to user space, as the user space stack would be the same.
> That is, if it gets scheduled multiple times, this would only happen on the
> first instance until it leaves the kernel.
> 
> 
>       [ trigger stack trace - set sched_work ]
> 
>       schedule() {
>               context_switch() -> CPU runs some other task
>                                <- gets scheduled back onto the CPU
>               [..]
>               /* preemption enabled ... */
>               if (sched_work) {
>                       do stack trace() // can schedule here but
>                                        // calls a schedule function that does 
> not
>                                        // do sched_work to prevent recursion
>               }
>       }
> 
> Could something like this work?

Yeah, this is basically a more fleshed out version of what I was trying
to propose.

One additional wrinkle is that if @prev wakes up on another CPU while
@next is unwinding it, the unwind goes haywire.  So that would maybe
need to be prevented.

-- 
Josh

Reply via email to