Oops mystery

Steve Wise Fri, 12 Jul 2013 08:49:00 -0700

Hello kernel experts,

I was wondering if someone has any ideas on this Oops. My analysis mustbe incorrect. From what I can tell, this shouldn't have caused a badpage fault, but it did :).


Here is what I see in the crash dump:

dmesg log shows this:

[ 1053.156266] BUG: unable to handle kernel paging request at0000000000040fc0

[ 1053.216620] IP: [<ffffffffa02b202e>] c4iw_ev_handler+0x2e/0x84 [iw_cxgb4]
[ 1053.216638] PGD 8b9877067 PUD 86cd37067 PMD 0
[ 1053.216642] Oops: 0002 [#1] SMP

c4iw_ev_handler+0x2e is:

crash> dis -r c4iw_ev_handler+0x2e
0xffffffffa02b2000 <c4iw_ev_handler>:   push   %rbp
0xffffffffa02b2001 <c4iw_ev_handler+1>: push   %rbx
0xffffffffa02b2002 <c4iw_ev_handler+2>: sub $0x8,%rsp
0xffffffffa02b2006 <c4iw_ev_handler+6>: mov %rdi,%rbp
0xffffffffa02b2009 <c4iw_ev_handler+9>: mov %esi,%ebx
0xffffffffa02b200b <c4iw_ev_handler+11>: lea    0x8a0(%rdi),%rdi

0xffffffffa02b2012 <c4iw_ev_handler+18>: callq 0xffffffff811e1020<idr_find>

0xffffffffa02b2017 <c4iw_ev_handler+23>: mov    %rax,%rcx
0xffffffffa02b201a <c4iw_ev_handler+26>: test   %rax,%rax

0xffffffffa02b201d <c4iw_ev_handler+29>: je 0xffffffffa02b203d<c4iw_ev_handler+61>

0xffffffffa02b201f <c4iw_ev_handler+31>: movzwl 0x88(%rax),%eax
0xffffffffa02b2026 <c4iw_ev_handler+38>: mov    0x38(%rcx),%rdx
0xffffffffa02b202a <c4iw_ev_handler+42>: shl    $0x6,%rax
0xffffffffa02b202e <c4iw_ev_handler+46>:        movb   $0x0,0xe(%rax,%rdx,1)

Crash shows these regs:

crash> bt
PID: 12915  TASK: ffff8808d50da200  CPU: 4   COMMAND: "DSI_SvrReceiveR"
 #0 [ffff880751c039b0] machine_kexec at ffffffff81020a62
 #1 [ffff880751c03a00] crash_kexec at ffffffff81088780
 #2 [ffff880751c03ad0] oops_end at ffffffff8139efe0
 #3 [ffff880751c03af0] __bad_area_nosemaphore at ffffffff8102ed15
 #4 [ffff880751c03bb0] page_fault at ffffffff8139e25f
    [exception RIP: c4iw_ev_handler+46]
    RIP: ffffffffa02b202e  RSP: ffff880751c03c60  RFLAGS: 00010206
    RAX: 0000000000040fc0 RBX: 0000000000000404  RCX: ffff880c35da9080
    RDX: ffff8808b5500000 RSI: 0000000000000404  RDI: ffff8808d5fabd50
    RBP: ffff880c2e5a4000   R8: 0000000000000000   R9: ffff8808d5fabb30
    R10: 0000000000000110  R11: ffffffff8101f9b0  R12: 0000000000000000
    R13: ffff880c20598230  R14: ffff880c2e5a4000  R15: ffff880c3dbf1480
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
<snip>

So 'movb $0x0,0xe(%rax,%rdx,1)' should be storing 0 into the bytelocation:


%rax + 0xe + (%rdx * 1) ==
0x40fc+ 0xe + 0xffff8808b5500000 ==
0xffff8808b5540fce.

That address is readable in the crash dump:

crash> x/8b 0x0000000000040fc0+0xe+0xffff8808b5500000

0xffff8808b5540fce: 0x00 0x00 0x00 0x00 0x00 0x000x00 0x00

And why does the page fault show 0x40fc0 as the faulting address? Itshould be 0xffff8808b5540fce and it shouldn't have caused a page fault.


What am I missing?

Thanks in advance,

Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Oops mystery

Reply via email to