On 6/3/20 6:30 pm, Daniel Axtens wrote:
kcov instrumentation is collected the __sanitizer_cov_trace_pc hook in
kernel/kcov.c. The compiler inserts these hooks into every basic block
unless kcov is disabled for that file.

We then have a deep call-chain:
  - __sanitizer_cov_trace_pc calls to check_kcov_mode()
  - check_kcov_mode() (kernel/kcov.c) calls in_task()
  - in_task() (include/linux/preempt.h) calls preempt_count().
  - preempt_count() (include/asm-generic/preempt.h) calls
      current_thread_info()
  - because powerpc has THREAD_INFO_IN_TASK, current_thread_info()
      (include/linux/thread_info.h) is defined to 'current'
  - current (arch/powerpc/include/asm/current.h) is defined to
      get_current().
  - get_current (same file) loads an offset of r13.
  - arch/powerpc/include/asm/paca.h makes r13 a register variable
      called local_paca - it is the PACA for the current CPU, so
      this has the effect of loading the current task from PACA.
  - get_current returns the current task from PACA,
  - current_thread_info returns the task cast to a thread_info
  - preempt_count dereferences the thread_info to load preempt_count
  - that value is used by in_task and so on up the chain

The problem is:

  - kcov instrumentation is enabled for arch/powerpc/kernel/dt_cpu_ftrs.c

  - even if it were not, dt_cpu_ftrs_init calls generic dt parsing code
    which should definitely have instrumentation enabled.

  - setup_64.c calls dt_cpu_ftrs_init before it sets up a PACA.

  - If we don't set up a paca, r13 will contain unpredictable data.

  - In a zImage compiled with kcov and KASAN, we see r13 containing a value
    that leads to dereferencing invalid memory (something like
    912a72603d420015).

  - Weirdly, the same kernel as a vmlinux loaded directly by qemu does not
    crash. Investigating with gdb, it seems that in the vmlinux boot case,
    r13 is near enough to zero that we just happen to be able to read that
    part of memory (we're operating with translation off at this point) and
    the current pointer also happens to land in readable memory and
    everything just works.

  - PACA setup refers to CPU features - setup_paca() looks at
    early_cpu_has_feature(CPU_FTR_HVMODE)

There's no generic kill switch for kcov (as far as I can tell), and we
don't want to have to turn off instrumentation in the generic dt parsing
code (which lives outside arch/powerpc/) just because we don't have a real
paca or task yet.

So:
  - change the test when setting up a PACA to consider the actual value of
    the MSR rather than the CPU feature.

  - move the PACA setup to before the cpu feature parsing.

Translations get switched on once we leave early_setup, so I think we'd
already catch any other cases where the PACA or task aren't set up.

Boot tested on a P9 guest and host.

Fixes: fb0b0a73b223 ("powerpc: Enable kcov")
Cc: Andrew Donnellan <a...@linux.ibm.com>
Suggested-by: Michael Ellerman <m...@ellerman.id.au>
Signed-off-by: Daniel Axtens <d...@axtens.net>

---

Regarding moving the comment about printk()-safety:
I am about 75% sure that the thing that makes printk() safe is the PACA,
not the CPU features. That's what commit 24d9649574fb ("[POWERPC] Document
when printk is useable") seems to indicate, but as someone wise recently
told me, "bootstrapping is hard", so I may be totally wrong.

v3: Update comment, thanks Christophe Leroy.
     Remove a comment in dt_cpu_ftrs.c that is no longer accurate - thanks
       Andrew. I think we want to retain all the code still, but I'm open to
       being told otherwise.

Thanks for doing that.

This patch and the justification doesn't seem obviously wrong, and is snowpatch-clean.

Reviewed-by: Andrew Donnellan <a...@linux.ibm.com>

(Is it worth cc'ing this to stable in case there are other situations we haven't foreseen where we hit the unpredictable r13 data? Few people use kcov...)


--
Andrew Donnellan              OzLabs, ADL Canberra
a...@linux.ibm.com             IBM Australia Limited

Reply via email to