‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, April 2, 2020 2:35 PM, Martin Pieuchot <[email protected]> wrote:

> On 02/04/20(Thu) 13:59, Martin wrote:
>
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > On Thursday, April 2, 2020 1:21 PM, Martin Pieuchot [email protected] wrote:
> >
> > > On 02/04/20(Thu) 12:58, Martin wrote:
> > >
> > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > > On Tuesday, March 31, 2020 3:27 PM, Martin Pieuchot [email protected] 
> > > > wrote:
> > > >
> > > > > On 31/03/20(Tue) 15:08, Martin wrote:
> > > > >
> > > > > > 1.  top -SH -s .3 points me that stutters arrive once process 
> > > > > > changing its state from 'idle' to 'active' with related disk 
> > > > > > activity.
> > > > >
> > > > > What about %spin and %intr?
> > > >
> > > > 1.  AMD GX-420CA SOC 4-core 4-thread
> > > >
> > > > CPU0 %spin from 2.0% to 17.0% %intr 30.0%-96.0%
> > > > CPU1-3 %spin 0.0% (always) %intr 15.0%-99.0%
> > > >
> > > > 2.  i7-2640m 2-core 4-thread
> > > >
> > > > CPU0 %spin from 0.0% to 3.0% %intr 0.0% (always)
> > > > CPU2 %spin from 0.0% to 2.0% (rare) %intr 0.0% (always)
> > >
> > > Interesting so whatever that is it seems related or amplified by a lot
> > > of time spent dealing with interrupt.
> > > You can use "systat -s .3" and/or "vmstat -i" to figure out which
> > > interrupt has a higher rate when you observe the symptoms.
> >
> > 1.  AMD SOC
> >     systat -s .3 seems interrupts too (stutters) when system wide stutter 
> > appears.
> >     Interrupts
> >     500-1200 total
> >     96-98 clock
> >     155-350,sometimes up to 1100 ipi
> >
>
> A lot of IPIs! We're making progress. This rings a bell, I'd suggest
> you look at my slides/talk from EuroBSDCon2017 called: "Your scheduler
> is not the problem". This might not be a similar problem but it gives
> a lot of insides about how to debug further.

Thank you. I will see it soon.

>
> Which application are you running to trigger those? What is the
> "background process" that you're talking about? Did you ktrace(1) it?
> What is it doing when you see the stutters?

First of all, I dump/restore whole system from AMD SOC to i7 laptop with all 
the same userland software running. i7 don't show a lot of IPIs with exactly 
the same kernel and userland as you can see, but system configuration is the 
same, and stutters are significantly lower on i7 (rare).
Anyway, I have disabled all the software from packages step by step on AMD SOC 
to find out which program affects on it. Stutters are present anyway. So I 
don't know which software package to ktrace, unfortunately.

Secondly, I thought this behavior caused by USB devices connected (I have more 
then 12 USB2 devices attached to the system simultaneously). I disconnected 
them all from AMD SOC. No result.
Thirdly, I disabled package scripts from load, slightly better, but stutters 
are present on AMD.

The _same_ AMD SOC runs for about a year with 6.4 installed, no stutters. So I 
think something changed 6.4 -> 6.6. I've upgraded 6.4 to 6.5 next to 6.6. On 
6.5 I didn't test the system and upgrade to next flavor, and I can figure out 
in which version the 'bug' appeared.

In the end I set up fresh 6.6 to both machines and get the same behavior. But 
stutters a bit 'shorter' if no any userland software running and USB devices 
disconnected.

>
> The picture now seems to be clearer: something is causing a high number
> of IPI. That creates latency and all other task are somehow delayed
> resulting in some stuttering.
>
> The question now becomes: why so many IPIs are being generated and is it
> possible to lower the insanely high rate.

Can you explain in two words what is IPI?
>
> Please make sure to do the ktracing first, that should give us the
> userland view of the situation. Then you could additionally do the
> Flamegraph gathering which should give us the kernel view of situation.

I try to do it as you require.

>
> > > If nobody has a idea of what that could be, another useful information
> > > would be to produce a flamegraph when you observe the stutters. For that
> > > you need to enable dt(4) in conf/GENERIC build & install a new kernel,
> > > build & install btrace(8) and set kern.allowdt=1 in /etc/sysctl.conf.
> > > After rebooting in the new kernel run the following:
> > > btrace -e 'profile:hz:15 { printf("%s1\n", kstack); }' > kstack.txt
> > > ====================================================================
> > > and it Ctrl+C to stop the profiling.
> > > Then you can build the Flamegraph with the tools described below or
> > > provide us the captured stack traces:
> > > https://github.com/brendangregg/FlameGraph

Martin

Reply via email to