On 10/08/2017 23:39, Paul E. McKenney wrote:
> On Thu, Aug 10, 2017 at 11:45:09AM +0200, Daniel Lezcano wrote:

[ ... ]

>> Nothing coming in mind but may be worth to mention the slowness of the
>> CPU is the aggravating factor. In particular I was able to reproduce the
>> issue by setting to the min CPU frequency. With the ondemand governor,
>> we can have the frequency high (hence enough CPU power) at the moment we
>> set the function_graph because another CPU is loaded (and both CPUs are
>> sharing the same clock line). The system became stuck at the moment the
>> other CPU went idle with the lowest frequency. That introduced
>> randomness in the issue and made hard to figure out why the RCU stall
>> was happening.
> Adding this, then?

Yes, sure.

Thanks Paul.

  -- Daniel

> ------------------------------------------------------------------------
> commit f7d9ce95064f76be583c775fac32076fa59f1617
> Author: Paul E. McKenney <paul...@linux.vnet.ibm.com>
> Date:   Thu Aug 10 14:33:17 2017 -0700
>     documentation: Slow systems can stall RCU grace periods
>     If a fast system has a worst-case grace-period duration of (say) ten
>     seconds, then running the same workload on a system ten times as slow
>     will get you an RCU CPU stall warning given default stall-warning
>     timeout settings.  This commit therefore adds this possibility to
>     stallwarn.txt.
>     Reported-by: Daniel Lezcano <daniel.lezc...@linaro.org>
>     Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>
> diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
> index 21b8913acbdf..238acbd94917 100644
> --- a/Documentation/RCU/stallwarn.txt
> +++ b/Documentation/RCU/stallwarn.txt
> @@ -70,6 +70,12 @@ o  A periodic interrupt whose handler takes longer than 
> the time
>       considerably longer than normal, which can in turn result in
>       RCU CPU stall warnings.
> +o    Testing a workload on a fast system, tuning the stall-warning
> +     timeout down to just barely avoid RCU CPU stall warnings, and then
> +     running the same workload with the same stall-warning timeout on a
> +     slow system.  Note that thermal throttling and on-demand governors
> +     can cause a single system to be sometimes fast and sometimes slow!
> +
>  o    A hardware or software issue shuts off the scheduler-clock
>       interrupt on a CPU that is not in dyntick-idle mode.  This
>       problem really has happened, and seems to be most likely to

 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Reply via email to