On Wed, Oct 24, 2018 at 1:03 PM, Juri Lelli <juri.le...@redhat.com> wrote: > > On 19/10/18 22:50, luca abeni wrote: > > On Fri, 19 Oct 2018 13:39:42 +0200 > > Peter Zijlstra <pet...@infradead.org> wrote: > > > > > On Thu, Oct 18, 2018 at 01:08:11PM +0200, luca abeni wrote: > > > > Ok, I see the issue now: the problem is that the "while > > > > (dl_se->runtime <= 0)" loop is executed at replenishment time, but > > > > the deadline should be postponed at enforcement time. > > > > > > > > I mean: in update_curr_dl() we do: > > > > dl_se->runtime -= scaled_delta_exec; > > > > if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) { > > > > ... > > > > enqueue replenishment timer at dl_next_period(dl_se) > > > > But dl_next_period() is based on a "wrong" deadline! > > > > > > > > > > > > I think that inserting a > > > > while (dl_se->runtime <= -pi_se->dl_runtime) { > > > > dl_se->deadline += pi_se->dl_period; > > > > dl_se->runtime += pi_se->dl_runtime; > > > > } > > > > immediately after "dl_se->runtime -= scaled_delta_exec;" would fix > > > > the problem, no? > > > > > > That certainly makes sense to me. > > > > Good; I'll try to work on this idea in the weekend. > > So, we (me and Luca) managed to spend some more time on this and found a > few more things worth sharing. I'll try to summarize what we have got so > far (including what already discussed) because impression is that each > point might deserve a fix or at least consideration (just amazing how a > simple random fuzzer thing can highlight all that :).
1. Fuzzing finds bugs in any code. Always. If a code wasn't fuzzed, there are bugs. 2. This fuzzer is not so simple ;) > Apologies for the > long email. > > Reproducer runs on a CONFIG_HZ=100, CONFIG_IRQ_TIME_ACCOUNTING kernel > and does something like this (only the bits that seems to matter here) > > int main(void) > { > [...] > [setup stuff at 0x2001d000] > syscall(__NR_perf_event_open, 0x2001d000, 0, -1, -1, 0); > *(uint32_t*)0x20000000 = 0; > *(uint32_t*)0x20000004 = 6; > *(uint64_t*)0x20000008 = 0; > *(uint32_t*)0x20000010 = 0; > *(uint32_t*)0x20000014 = 0; > *(uint64_t*)0x20000018 = 0x9917; <-- ~40us > *(uint64_t*)0x20000020 = 0xffff; <-- ~65us (~60% bandwidth) > *(uint64_t*)0x20000028 = 0; > syscall(__NR_sched_setattr, 0, 0x20000000, 0); > [busy loop] > return 0; > } > > And this causes problems because the task is actually never throttled. > > Pain points: > > 1. Granularity of enforcement (at each tick) is huge compared with > the task runtime. This makes starting the replenishment timer, > when runtime is depleted, always to fail (because old deadline > is way in the past). So, the task is fully replenished and put > back to run. > > - Luca's proposal should help here, since the deadline is postponed > at throttling time, and replenishment timer set to that (and it > should be in the future) > > 1.1 Even if we fix 1. in a configuration like this, the task would > still be able to run for ~10ms (worst case) and potentially starve > other tasks. It doesn't seem a too big interval maybe, but there > might be other very short activities that might miss an occasion > to run "quickly". > > - Might be fixed by imposing (via sysctl) reasonable defaults for > minimum runtime (w.r.t. HZ, like HZ/2) and maximum for period > (as also a very small bandwidth task can have a big runtime if > period is big as well) > > (1.2) When runtime becomes very negative (because delta_exec was big) > we seem to spend lot of time inside the replenishment loop. > > - Not sure it's such a big problem, might need more profiling. > Feeling is that once the other points will be addressed this > won't matter anymore > > 2. This is related to perf_event_open syscall reproducer does before > becoming DEADLINE and entering the busy loop. Enabling of perf > swevents generates lot of hrtimers load that happens in the > reproducer task context. Now, DEADLINE uses rq_clock() for setting > deadlines, but rq_clock_task() for doing runtime enforcement. > In a situation like this it seems that the amount of irq pressure > becomes pretty big (I'm seeing this on kvm, real hw should maybe do > better, pain point remains I guess), so rq_clock() and > rq_clock_task() might become more a more skewed w.r.t. each other. > Since rq_clock() is only used when setting absolute deadlines for > the first time (or when resetting them in certain cases), after a > bit the replenishment code will start to see postponed deadlines > always in the past w.r.t. rq_clock(). And this brings us back to the > fact that the task is never stopped, since it can't keep up with > rq_clock(). > > - Not sure yet how we want to address this [1]. We could use > rq_clock() everywhere, but tasks might be penalized by irq > pressure (theoretically this would mandate that irqs are > explicitly accounted for I guess). I tried to use the skew between > the two clocks to "fix" deadlines, but that puts us at risks of > de-synchronizing userspace and kernel views of deadlines. > > 3. HRTICK is not started for new entities. > > - Already got a patch for it. > > This should be it, I hope. Luca (thanks a lot for your help) and please > add or correct me if I was wrong. > > Thoughts? > > Best, > > - Juri > > 1 - > https://elixir.bootlin.com/linux/latest/source/kernel/sched/deadline.c#L1162 > > -- > You received this message because you are subscribed to the Google Groups > "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to syzkaller-bugs+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/syzkaller-bugs/20181024120335.GE29272%40localhost.localdomain. > For more options, visit https://groups.google.com/d/optout.