On 09/01/18 16:50, Rafael J. Wysocki wrote: > On Tue, Jan 9, 2018 at 3:43 PM, Leonard Crestez <leonard.cres...@nxp.com> > wrote:
[...] > > Every 4 seconds (really it's /proc/sys/kernel/watchdog_thresh * 2 / 5 > > and watchdog_thresh defaults to 10). There is a per-cpu hrtimer which > > wakes the per-cpu thread in order to check that tasks can still > > execute, this works very well against bugs like infinite loops in > > softirq mode. The timers are synchronized initially but can get > > staggered (for example by hotplug). > > > > My guess is that it's only marked RT so that it executes ahead of other > > threads and the watchdog doesn't trigger simply when there are lots of > > userspace tasks. > > I think so too. > > I see a couple of more-or-less hackish ways to avoid the issue, but > nothing particularly attractive ATM. > > I wouldn't change the general behavior with respect to RT tasks > because of this, though, as we would quickly find a case in which that > would turn out to be not desirable. I agree we cannot generalize to all RT tasks, but what Patrick proposed (clamping utilization of certain known tasks) might help here: lkml.kernel.org/r/20170824180857.32103-1-patrick.bell...@arm.com Maybe with a per-task interface instead of using cgroups? The other option would be to relax DL tasks affinity constraints, so that a case like this might be handled. Daniel and Tommaso proposed possible approaches, this might be a driving use case. Not sure how we would come up with a proper runtime for the watchdog, though. Best, - Juri