On Fri, Jan 24, 2025 at 02:46:48PM -0800, Josh Poimboeuf wrote:
> On Fri, Jan 24, 2025 at 04:58:03PM -0500, Steven Rostedt wrote:
> > Now the only thing I could think of is a flag gets set where the task comes
> > out of the scheduler and then does the stack trace. It doesn't need to do
> > the stack trace before it schedules. As it did just schedule, where ever it
> > scheduled must have been in a schedulable context.
> > 
> > That is, kind of like the task_work flag for entering user space and
> > exiting the kernel, could we have a sched_work flag to run after after being
> > scheduled back (exiting schedule()). Since the task has been picked to run,
> > it will not cause latency for other tasks. The work will be done in its
> > context. This is no different to the tasks accounting than if it does this
> > going back to user space. Heck, it would only need to do this once if it
> > didn't go back to user space, as the user space stack would be the same.
> > That is, if it gets scheduled multiple times, this would only happen on the
> > first instance until it leaves the kernel.
> > 
> > 
> >     [ trigger stack trace - set sched_work ]
> > 
> >     schedule() {
> >             context_switch() -> CPU runs some other task
> >                              <- gets scheduled back onto the CPU
> >             [..]
> >             /* preemption enabled ... */
> >             if (sched_work) {
> >                     do stack trace() // can schedule here but
> >                                      // calls a schedule function that does 
> > not
> >                                      // do sched_work to prevent recursion
> >             }
> >     }
> > 
> > Could something like this work?
> 
> Yeah, this is basically a more fleshed out version of what I was trying
> to propose.
> 
> One additional wrinkle is that if @prev wakes up on another CPU while
> @next is unwinding it, the unwind goes haywire.  So that would maybe
> need to be prevented.

Hm, reading this again I'm wondering if you're actually proposing that
the unwind happens on @prev after it gets rescheduled sometime in the
future?  Does that actually solve the issue?  What if doesn't get
rescheduled within a reasonable amount of time?

-- 
Josh

Reply via email to