On Thu, Jan 23, 2025 at 10:43:05AM -0800, Josh Poimboeuf wrote: > On Thu, Jan 23, 2025 at 09:25:34AM +0100, Peter Zijlstra wrote: > > On Wed, Jan 22, 2025 at 08:05:33PM -0800, Josh Poimboeuf wrote: > > > > > However... would it be a horrible idea for 'next' to unwind 'prev' after > > > the context switch??? > > > > The idea isn't terrible, but it will be all sorta of tricky. > > > > The big immediate problem is that the CPU doing the context switch > > looses control over prev at: > > > > __schedule() > > context_switch() > > finish_task_switch() > > finish_task() > > smp_store_release(&prev->on_cpu, 0); > > > > And this is before we drop rq->lock. > > > > The instruction after that store another CPU is free to claim the task > > and run with it. Notably, another CPU might already be spin waiting on > > that state, trying to wake the task back up. > > > > By the time we get to a schedulable context, @prev is completely out of > > bounds. > > Could unwind_deferred_request() call migrate_disable() or so?
That's pretty vile... and might cause performance issues. You realy don't want things to magically start behaving differently just because you're tracing. > How bad would it be to set some bit in @prev to prevent it from getting > rescheduled until the unwind from @next has been done? Unfortunately > two tasks would be blocked on the unwind instead of one. Yeah, not going to happen. Those paths are complicated enough as is. > BTW, this might be useful for another reason. In Steve's sframe meeting > yesterday there was some talk of BPF needing to unwind from > sched-switch, without having to wait indefinitely for @prev to get > rescheduled and return to user. -EPONIES, you cannot take faults from the middle of schedule(). They can always use the best effort FP unwind we have today.