On Mon, 23 Nov 2020 at 14:55, Peter Zijlstra <[email protected]> wrote:
> On Mon, Nov 23, 2020 at 02:23:00PM +0100, Marco Elver wrote:
> > When enabling KCSAN for kernel/sched (remove KCSAN_SANITIZE := n from
> > kernel/sched/Makefile), with CONFIG_DEBUG_PREEMPT=y, we can observe
> > recursion due to:
> >
> > check_access() [via instrumentation]
> > kcsan_setup_watchpoint()
> > reset_kcsan_skip()
> > kcsan_prandom_u32_max()
> > get_cpu_var()
> > preempt_disable()
> > preempt_count_add() [in kernel/sched/core.c]
> > check_access() [via instrumentation]
> >
> > Avoid this by rewriting kcsan_prandom_u32_max() to only use safe
> > versions of preempt_disable() and preempt_enable() that do not call into
> > scheduler code.
> >
> > Note, while this currently does not affect an unmodified kernel, it'd be
> > good to keep a KCSAN kernel working when KCSAN_SANITIZE := n is removed
> > from kernel/sched/Makefile to permit testing scheduler code with KCSAN
> > if desired.
> >
> > Fixes: cd290ec24633 ("kcsan: Use tracing-safe version of prandom")
> > Signed-off-by: Marco Elver <[email protected]>
> > ---
> > v2:
> > * Update comment to also point out preempt_enable().
> > ---
> > kernel/kcsan/core.c | 15 ++++++++++++---
> > 1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
> > index 3994a217bde7..10513f3e2349 100644
> > --- a/kernel/kcsan/core.c
> > +++ b/kernel/kcsan/core.c
> > @@ -284,10 +284,19 @@ should_watch(const volatile void *ptr, size_t size,
> > int type, struct kcsan_ctx *
> > */
> > static u32 kcsan_prandom_u32_max(u32 ep_ro)
> > {
> > - struct rnd_state *state = &get_cpu_var(kcsan_rand_state);
> > - const u32 res = prandom_u32_state(state);
> > + struct rnd_state *state;
> > + u32 res;
> > +
> > + /*
> > + * Avoid recursion with scheduler by using non-tracing versions of
> > + * preempt_disable() and preempt_enable() that do not call into
> > + * scheduler code.
> > + */
> > + preempt_disable_notrace();
> > + state = raw_cpu_ptr(&kcsan_rand_state);
> > + res = prandom_u32_state(state);
> > + preempt_enable_no_resched_notrace();
>
> This is a preemption bug. Does preempt_enable_notrace() not work?
No it didn't, because we end up calling preempt_schedule_notrace(),
which again might end in recursion.
Normally we could surround this by
kcsan_disable_current/kcsan_enable_current(), but that doesn't work
because we have this sequence:
reset_kcsan_skip();
if (!kcsan_is_enabled())
...
to avoid underflowing the skip counter if KCSAN is disabled. That
could be solved by writing to the skip-counter twice: once with a
non-random value, and if KCSAN is enabled with a random value. Would
that be better?
And I'd like to avoid adding __no_kcsan to scheduler functions.
Any recommendation?
Thanks,
-- Marco
>
> >
> > - put_cpu_var(kcsan_rand_state);
> > return (u32)(((u64) res * ep_ro) >> 32);
> > }
> >
> > --
> > 2.29.2.454.gaff20da3a2-goog
> >