Le 14/02/2020 à 07:24, Christophe Leroy a écrit :
Larry,
Le 14/02/2020 à 00:09, Larry Finger a écrit :
Christophe,
With this patch, it gets further. Sometime after the boot process
tries to start process init, it crashes with the unable to read data
at 0x000157a0 with a faulting address of 0xc001683c. The screenshot is
attached and the gzipped vmlinux is at
http://www.lwfinger.com/download/vmlinux2.gz. The patches that were
applied for this kernel are also attached,
Did you try with the patch at https://patchwork.ozlabs.org/patch/1237387/ ?
I see the problem happens in kprobe_handler(). Can you try without
CONFIG_KPROBE ?
In fact, you hit two bugs. The first one is due to CONFIG_VMAP_STACK.
The second one has always existed (at least since kernel source tree has
been in git).
First bug is in function enter_rtas() which tries to read data on stack
by using the linear physical address translation. This cannot be used
with VM stack, it must re-enable data MMU translation to access data on
the stack.
Second bug is in kprobe_handler() function, which does:
if (*addr != BREAKPOINT_INSTRUCTION)
addr is the address where the 'trap' happened. When a trap happens with
MMU disabled, addr contains the physical address of the trap.
kprobe_handler() tries to read the instruction using physical address
whereas MMU is enabled, so you get a bad access either because the said
address is not mapped, or because access to userspace is not allowed.
Due to the first bug, you get a 'machine check', and as
current->thread.rtas_sp has not been cleared yet, the machine check
handler jumps to 'machine_check_in_rtas'.
machine_check_in_rtas does a trap, which in turn triggers the second bug.
Once the first bug is fixed, the second one should not popup.
Can you test patch https://patchwork.ozlabs.org/patch/1237929/ that
fixes the first bug ?
Christophe