> On 21 Oct 2016, at 16:49, Rob Gardner <[email protected]> wrote:
> 
> On 10/21/2016 06:57 AM, Anatoly Pugachev wrote:
>> On Fri, Oct 21, 2016 at 12:12 PM, Anatoly Pugachev <[email protected]> 
>> wrote:
>>> On Wed, Sep 7, 2016 at 1:01 PM, Anatoly Pugachev <[email protected]> wrote:
>>>> On Wed, Sep 7, 2016 at 12:22 PM, John Paul Adrian Glaubitz
>>>> <[email protected]> wrote:
>>>>> Hello!
>>>>> 
>>>>> After kernel 4.7.2 entered Debian unstable, I decided to upgrade the 
>>>>> buildds and ran into an
>>>>> apparent regression with the 4.7.x kernels on sun4u machines:
>>>> It's not only with sun4u, we're getting kernel OOPS on sun4v as well:
>>> debian packaged 4.7.6 kernel, machine is a LDOM on T5-2 server, OOPS
>>> after kernel boot within a few minutes:
>> 
>> reproduced with latest git 4.9.0-rc1+ (v4.9-rc1-148-g6f33d645) kernel.
>> Machine boots ok, i can login as unprivileged user (via ssh), compile
>> and install kernel, run sudo, install packages (apt upgrade),
>> apache/mysql and other startup daemons works, but if I try to login as
>> root via ssh, it throws kernel oops / illegal instruction.
>> 
>> Any idea how to debug this?
>> 
>> otherhost$ ssh ttip -l root -v
>> ...
>> debug1: channel 0: new [client-session]
>> debug1: Requesting [email protected]
>> debug1: Entering interactive session.
>> Write failed: Broken pipe
>> $
>> 
>> I can strace -f -p $pid_of_sshd , but not sure it would help.
>> 
>> URL version => http://paste.debian.net/plain/884751
>> kernel config => http://paste.debian.net/plain/884806
>> 
>> NOTICE: Entering OpenBoot.
>> NOTICE: Fetching Guest MD from HV.
>> NOTICE: Starting additional cpus.
>> NOTICE: Initializing LDC services.
>> NOTICE: Probing PCI devices.
>> NOTICE: Finished PCI probing.
>> 
>> SPARC T5-2, No Keyboard
>> Copyright (c) 1998, 2016, Oracle and/or its affiliates. All rights reserved.
>> OpenBoot 4.38.5, 32.0000 GB memory available, Serial #83494642.
>> Ethernet address 0:14:4f:fa:6:f2, Host ID: 84fa06f2.
>> 
>> 
>> 
>> Boot device: vdisk1  File and args:
>> SILO Version 1.4.14
>> boot:
>> Allocated 64 Megs of memory at 0x40000000 for kernel
>> Uncompressing image...
>> Loaded kernel version 4.9.0
>> Loading initial ramdisk (13616359 bytes at 0x74000000 phys, 0x40C00000 
>> virt)...
>> 
>> [    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.38.5 2016/06/22 19:36'
>> [    0.000000] PROMLIB: Root node compatible: sun4v
>> [    0.000000] Linux version 4.9.0-rc1+ (mator@ttip) (gcc version
>> 6.2.0 20161010 (Debian 6.2.0-6+sparc64) ) #19 SMP Fri Oct 21 14:47:01
>> MSK 2016
>> [    0.000000] bootconsole [earlyprom0] enabled
>> [    0.000000] ARCH: SUN4V
>> ... snip ...
>> [5446612.115339] dbus-daemon(521): Kernel illegal instruction [#3]
>> [5446612.115342] CPU: 15 PID: 521 Comm: dbus-daemon Tainted: G      D
>>        4.9.0-rc1+ #19
>> [5446612.115347] task: fff800080b331bc0 task.stack: fff80007f937c000
>> [5446612.115349] TSTATE: 0000004411001606 TPC: 00000000005ccfec TNPC:
>> 00000000005ccff0 Y: 00000000    Tainted: G      D
>> [5446612.115353] TPC: <__kmalloc_track_caller+0x14c/0x240>
>> [5446612.115355] g0: fff800080fb28b00 g1: 0000000000400000 g2:
>> 0000000000000000 g3: 00000000c0000000
>> [5446612.115357] g4: fff800080b331bc0 g5: fff800082c5b0000 g6:
>> fff80007f937c000 g7: 0000000000003c06
>> [5446612.115358] o0: 0000000000000000 o1: 00000000025106c0 o2:
>> 000000005a5a5a5a o3: fff800080fb28b00
>> [5446612.115360] o4: 5a5a5a5a5a5a5a5a o5: 0000000000000028 sp:
>> fff80007f937eda1 ret_pc: 00000000005ccfe4
>> [5446612.115362] RPC: <__kmalloc_track_caller+0x144/0x240>
>> [5446612.115365] l0: fff8000030402800 l1: 000007feffe44e40 l2:
>> 000007feffe452b0 l3: 0000000000000000
>> [5446612.115367] l4: 0000000000000000 l5: 0000000000000020 l6:
>> fff8000100b875c8 l7: fff800010026bf30
>> [5446612.115368] i0: 0000000000000240 i1: 00000000025106c0 i2:
>> 0000000000864e00 i3: 00000000025106c0
>> [5446612.115371] i4: 0000000000000000 i5: 00000000025106c0 i6:
>> fff80007f937ee51 i7: 0000000000864d40
>> [5446612.115376] I7: <__kmalloc_reserve.isra.5+0x20/0x80>
>> [5446612.115376] Call Trace:
>> [5446612.115378]  [0000000000864d40] __kmalloc_reserve.isra.5+0x20/0x80
>> [5446612.115381]  [0000000000864e00] __alloc_skb+0x60/0x180
>> [5446612.115383]  [0000000000864f68] alloc_skb_with_frags+0x48/0x1c0
>> [5446612.115390]  [000000000085f54c] sock_alloc_send_pskb+0x1ec/0x220
>> [5446612.115400]  [00000000009367a8] unix_stream_sendmsg+0x228/0x380
>> [5446612.115404]  [0000000000859ddc] sock_sendmsg+0x3c/0x80
>> [5446612.115406]  [000000000085a810] ___sys_sendmsg+0x250/0x260
>> [5446612.115409]  [000000000085b794] __sys_sendmsg+0x34/0x80
>> [5446612.115411]  [000000000085b800] SyS_sendmsg+0x20/0x40
>> [5446612.115415]  [00000000004061f4] linux_sparc_syscall+0x34/0x44
>> [5446612.115417] Caller[0000000000864d40]: __kmalloc_reserve.isra.5+0x20/0x80
>> [5446612.115419] Caller[0000000000864e00]: __alloc_skb+0x60/0x180
>> [5446612.115423] Caller[0000000000864f68]: alloc_skb_with_frags+0x48/0x1c0
>> [5446612.115425] Caller[000000000085f54c]: sock_alloc_send_pskb+0x1ec/0x220
>> [5446612.115428] Caller[00000000009367a8]: unix_stream_sendmsg+0x228/0x380
>> [5446612.115430] Caller[0000000000859ddc]: sock_sendmsg+0x3c/0x80
>> [5446612.115433] Caller[000000000085a810]: ___sys_sendmsg+0x250/0x260
>> [5446612.115435] Caller[000000000085b794]: __sys_sendmsg+0x34/0x80
>> [5446612.115437] Caller[000000000085b800]: SyS_sendmsg+0x20/0x40
>> [5446612.115439] Caller[00000000004061f4]: linux_sparc_syscall+0x34/0x44
>> [5446612.115442] Caller[fff800010081770c]: 0xfff800010081770c
>> [5446612.115444] Instruction DUMP:
>> [5446612.115445]  ba100008
>> [5446612.115446]  400f1d4f
>> [5446612.115447]  01000000
>> [5446612.115447] <3ffffff2>
>> [5446612.115448]  01000000
>> [5446612.115450]  106fffbe
>> [5446612.115451]  01000000
>> [5446612.115452]  c611a036
>> [5446612.115452]  05002c16
>> [5446612.115452]
>> [5446612.115778] Caller[00000000005f9ed4]: SyS_mkdir+0x14/0x40
>> [5446612.115791] Caller[00000000004061f4]: linux_sparc_syscall+0x34/0x44
>> [5446612.115802] Caller[fff80001001ef870]: 0xfff80001001ef870
>> [5446612.115818] Instruction DUMP:[5446612.115823]  ba100008
>>  400f1baf [5446612.115839]  01000000
>> <3ffffff2>[5446612.115852]  01000000
>>  106fffbe [5446612.115866]  01000000
>>  c611a036 [5446612.115879]  05002c16
>> [5446612.115892]
>> [5446612.115902] Fixing recursive fault but reboot is needed!
> 
> 
> In the instruction dump, the offending instruction is always 3ffffff2, and 
> according the the opcode map, this is some kind of Fujitsu Athena instruction 
> which probably ought to never be generated by gcc. Can you check to see if 
> this instruction is in your vmlinux file? Do 'objdump -d vmlinux' and go to 
> the addresses shown in TPC in the dump (ie, 00000000005ccfe) and see what's 
> there. If you see 3ffffff2, then somehow some bogus instruction made it into 
> the vmlinux executable. If you see something else, then it means that the 
> instruction got changed in memory after the system was booted. That could be 
> either a stray memory write or a boot time patch gone wrong. Either way, it 
> may help narrow down the problem.

Hi Rob,
They are definitely NOPs in vmlinux being clobbered at load/runtime. According
to "gdb vmlinux", the call to _cond_resched is coming from mm/slab.h
slab_pre_alloc_hook (the call to might_sleep_if). What's the best way to get a
backtrace for writes to this address?

Regards,
James

Reply via email to