‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Thursday, April 2, 2020 2:35 PM, Martin Pieuchot <[email protected]> wrote:
> On 02/04/20(Thu) 13:59, Martin wrote: > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > On Thursday, April 2, 2020 1:21 PM, Martin Pieuchot [email protected] wrote: > > > > > On 02/04/20(Thu) 12:58, Martin wrote: > > > > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > > > On Tuesday, March 31, 2020 3:27 PM, Martin Pieuchot [email protected] > > > > wrote: > > > > > > > > > On 31/03/20(Tue) 15:08, Martin wrote: > > > > > > > > > > > 1. top -SH -s .3 points me that stutters arrive once process > > > > > > changing its state from 'idle' to 'active' with related disk > > > > > > activity. > > > > > > > > > > What about %spin and %intr? > > > > > > > > 1. AMD GX-420CA SOC 4-core 4-thread > > > > > > > > CPU0 %spin from 2.0% to 17.0% %intr 30.0%-96.0% > > > > CPU1-3 %spin 0.0% (always) %intr 15.0%-99.0% > > > > > > > > 2. i7-2640m 2-core 4-thread > > > > > > > > CPU0 %spin from 0.0% to 3.0% %intr 0.0% (always) > > > > CPU2 %spin from 0.0% to 2.0% (rare) %intr 0.0% (always) > > > > > > Interesting so whatever that is it seems related or amplified by a lot > > > of time spent dealing with interrupt. > > > You can use "systat -s .3" and/or "vmstat -i" to figure out which > > > interrupt has a higher rate when you observe the symptoms. > > > > 1. AMD SOC > > systat -s .3 seems interrupts too (stutters) when system wide stutter > > appears. > > Interrupts > > 500-1200 total > > 96-98 clock > > 155-350,sometimes up to 1100 ipi > > > > A lot of IPIs! We're making progress. This rings a bell, I'd suggest > you look at my slides/talk from EuroBSDCon2017 called: "Your scheduler > is not the problem". This might not be a similar problem but it gives > a lot of insides about how to debug further. Thank you. I will see it soon. > > Which application are you running to trigger those? What is the > "background process" that you're talking about? Did you ktrace(1) it? > What is it doing when you see the stutters? First of all, I dump/restore whole system from AMD SOC to i7 laptop with all the same userland software running. i7 don't show a lot of IPIs with exactly the same kernel and userland as you can see, but system configuration is the same, and stutters are significantly lower on i7 (rare). Anyway, I have disabled all the software from packages step by step on AMD SOC to find out which program affects on it. Stutters are present anyway. So I don't know which software package to ktrace, unfortunately. Secondly, I thought this behavior caused by USB devices connected (I have more then 12 USB2 devices attached to the system simultaneously). I disconnected them all from AMD SOC. No result. Thirdly, I disabled package scripts from load, slightly better, but stutters are present on AMD. The _same_ AMD SOC runs for about a year with 6.4 installed, no stutters. So I think something changed 6.4 -> 6.6. I've upgraded 6.4 to 6.5 next to 6.6. On 6.5 I didn't test the system and upgrade to next flavor, and I can figure out in which version the 'bug' appeared. In the end I set up fresh 6.6 to both machines and get the same behavior. But stutters a bit 'shorter' if no any userland software running and USB devices disconnected. > > The picture now seems to be clearer: something is causing a high number > of IPI. That creates latency and all other task are somehow delayed > resulting in some stuttering. > > The question now becomes: why so many IPIs are being generated and is it > possible to lower the insanely high rate. Can you explain in two words what is IPI? > > Please make sure to do the ktracing first, that should give us the > userland view of the situation. Then you could additionally do the > Flamegraph gathering which should give us the kernel view of situation. I try to do it as you require. > > > > If nobody has a idea of what that could be, another useful information > > > would be to produce a flamegraph when you observe the stutters. For that > > > you need to enable dt(4) in conf/GENERIC build & install a new kernel, > > > build & install btrace(8) and set kern.allowdt=1 in /etc/sysctl.conf. > > > After rebooting in the new kernel run the following: > > > btrace -e 'profile:hz:15 { printf("%s1\n", kstack); }' > kstack.txt > > > ==================================================================== > > > and it Ctrl+C to stop the profiling. > > > Then you can build the Flamegraph with the tools described below or > > > provide us the captured stack traces: > > > https://github.com/brendangregg/FlameGraph Martin
