(+ Xuewen Yan, Ke Wang) Hello Tobias,
On 2/28/2024 9:40 PM, Tobias Huschle wrote: > The previously used CFS scheduler gave tasks that were woken up an > enhanced chance to see runtime immediately by deducting a certain value > from its vruntime on runqueue placement during wakeup. > > This property was used by some, at least vhost, to ensure, that certain > kworkers are scheduled immediately after being woken up. The EEVDF > scheduler, does not support this so far. Instead, if such a woken up > entitiy carries a negative lag from its previous execution, it will have > to wait for the current time slice to finish, which affects the > performance of the process expecting the immediate execution negatively. > > To address this issue, implement EEVDF strategy #2 for rejoining > entities, which dismisses the lag from previous execution and allows > the woken up task to run immediately (if no other entities are deemed > to be preferred for scheduling by EEVDF). > > The vruntime is decremented by an additional value of 1 to make sure, > that the woken up tasks gets to actually run. This is of course not > following strategy #2 in an exact manner but guarantees the expected > behavior for the scenario described above. Without the additional > decrement, the performance goes south even more. So there are some > side effects I could not get my head around yet. > > Questions: > 1. The kworker getting its negative lag occurs in the following scenario > - kworker and a cgroup are supposed to execute on the same CPU > - one task within the cgroup is executing and wakes up the kworker > - kworker with 0 lag, gets picked immediately and finishes its > execution within ~5000ns > - on dequeue, kworker gets assigned a negative lag > Is this expected behavior? With this short execution time, I would > expect the kworker to be fine. > For a more detailed discussion on this symptom, please see: > https://lore.kernel.org/netdev/ZWbapeL34Z8AMR5f@DESKTOP-2CCOB1S./T/ Does the lag clamping path from Xuewen Yan [1] work for the vhost case mentioned in the thread? Instead of placing the task just behind the 0-lag point, clamping the lag seems to be more principled approach since EEVDF already does it in update_entity_lag(). If the lag is still too large, maybe the above coupled with Peter's delayed dequeue patch can help [2] (Note: tree is prone to force updates) [1] https://lore.kernel.org/lkml/20240130080643.1828-1-xuewen....@unisoc.com/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e62ef63a888c97188a977daddb72b61548da8417 > 2. The proposed code change of course only addresses the symptom. Am I > assuming correctly that this is in general the exepected behavior and > that the task waking up the kworker should rather do an explicit > reschedule of itself to grant the kworker time to execute? > In the vhost case, this is currently attempted through a cond_resched > which is not doing anything because the need_resched flag is not set. > > Feedback and opinions would be highly appreciated. > > Signed-off-by: Tobias Huschle <husc...@linux.ibm.com> > --- > kernel/sched/fair.c | 5 +++++ > kernel/sched/features.h | 1 + > 2 files changed, 6 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 533547e3c90a..c20ae6d62961 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5239,6 +5239,11 @@ place_entity(struct cfs_rq *cfs_rq, struct > sched_entity *se, int flags) > lag = div_s64(lag, load); > } > > + if (sched_feat(NOLAG_WAKEUP) && (flags & ENQUEUE_WAKEUP)) { > + se->vlag = 0; > + lag = 1; > + } > + > se->vruntime = vruntime - lag; > > /* > diff --git a/kernel/sched/features.h b/kernel/sched/features.h > index 143f55df890b..d3118e7568b4 100644 > --- a/kernel/sched/features.h > +++ b/kernel/sched/features.h > @@ -7,6 +7,7 @@ > SCHED_FEAT(PLACE_LAG, true) > SCHED_FEAT(PLACE_DEADLINE_INITIAL, true) > SCHED_FEAT(RUN_TO_PARITY, true) > +SCHED_FEAT(NOLAG_WAKEUP, true) > > /* > * Prefer to schedule the task we woke last (assuming it failed -- Thanks and Regards, Prateek