2025-04-25T06:35:09Z Nam Cao <[email protected]>: > On Thu, Apr 24, 2025 at 03:55:34PM +0200, Gabriele Monaco wrote: >> I've been playing with these monitors, code-wise they look good. >> I tested a bit and they seem to work without many surprises by doing >> something as simple as: >> >> perf stat -e rv:error_sleep stress-ng --cpu-sched 1 -t 10s >> -- shows several errors -- > > This one is a monitor's bug. > > The monitor mistakenly sees the task getting woken up, *then* sees it going > to sleep. > > This is due to trace_sched_switch() being called with a stale 'prev_state'. > 'prev_state' is read at the beginning of __schedule(), but > trace_sched_switch() is invoked a bit later. Therefore if task->__state is > changed inbetween, 'prev_state' is not the value of task->__state. > > The monitor checks (prev_state & TASK_INTERRUPTIBLE) to determine if the > task is going to sleep. This can be incorrect due to the race above. The > monitor sees the task going to sleep, but actually it is just preempted. > > I think this also answers the race you observed with the srs monitor? >
Yeah that could be the culprit. Peter's fix [1] landed on next recently, I guess in a couple of days you'll get it on the upstream tree and you may not see the problem. Nevertheless, I didn't check exactly what --cpu-sched does, but I'm expecting it to do any wild thing to stress the scheduler, so it may be normal to have more errors than, say, --cyclic, which only runs nanosleeps. >> perf stat -e rv:error_sleep stress-ng --prio-inv 1 --prio-inv-policy rr >> -- shows only 1 error (normal while starting the program?) -- >> >> Not quite sound, but does it look a reasonable test to you? > > The above command use mutexes with priority inheritance. That is good for > real-time. The errors are due to real-time tasks being delayed by > waitpid(). > > Priority inheritance can be disabled with "--prio-inv-type none". Then you > will see lots of errors with mutexes. > Great, that's exactly what I wanted to know, thanks. >> I quickly tried the same with the other monitor comparing the number of >> errors with the page_faults generated by perf, but that didn't make too >> much sense. Perhaps I'm doing something wrong here though (the number >> reported by perf for page faults feels a bit too high). >> >> perf stat -e page-faults -e rv:error_pagefault stress-ng --cyclic 1 > > This command run a non-real-time thread to do setup, and a cyclic real-time > thread. The number of pagefaults of each thread would be roughly > proportional to the code size executed by each thread. As the non-real-time > thread's code size is bigger, it sounds reasonable that the number of > pagefaults is greater than the number of monitor's warnings. Mmh I guessed something like that, although numbers were a bit out of proportion (e.g. 500 page-faults and 8 errors), but again, I didn't check too carefully what happens under the hood. >> >> Anyway, the monitor looks good to me >> >> Reviewed-by: Gabriele Monaco <[email protected]> >> >> but it'd be nice if you have tips to share how to quickly test it (e.g. >> without writing a custom workload). > > I tested the monitor on a real system. My system has some real-time audio > processing processes (pipewire, firefox running youtube), yours also > should. That's a good point, also I didn't mention I was running these tests in a VM (virtme-ng), so the system stress is minimal and perhaps the setup triggers some different oddities (filesystems are overlays and some other things are set up differently from a real system). > > But thanks so much for testing with stress-ng. My testing didn't stress the > system enough for the above race to happen. I will give stress-ng a few > runs before the next version. > > Best regards, > Nam Thank you for the tips! Cheers, Gabriele [1] - https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8feb053d53194382fcfb68231296fdc220497ea6
