Here is an analysis of the kernel core dump captured for this issue :

crash> sys
     KERNEL: vmlinux-3.2.0-38-generic  
    DUMPFILE: VmCore
        CPUS: 24
        DATE: Wed Sep 18 22:34:35 2013
      UPTIME: 1 days, 11:33:14
LOAD AVERAGE: 2.04, 2.09, 2.16
       TASKS: 6656
    NODENAME: ddb-mongo41
     RELEASE: 3.2.0-38-generic
     VERSION: #61-Ubuntu SMP Tue Feb 19 12:18:21 UTC 2013
     MACHINE: x86_64  (2533 Mhz)
      MEMORY: 47.9 GB
       PANIC: "[127932.907100] kernel BUG at 
/build/buildd/linux-3.2.0/mm/mempolicy.c:1638!"
         PID: 21767
     COMMAND: "mongod"
        TASK: ffff8800874bdc00  [THREAD_INFO: ffff880544d8a000]
         CPU: 12
       STATE: EXIT_DEAD (PANIC)

Analysis
========
This is the backtrace of the panic task :

crash> bt
PID: 21767  TASK: ffff8800874bdc00  CPU: 12  COMMAND: "mongod"
 #0 [ffff880657cc3820] machine_kexec at ffffffff810393da
 #1 [ffff880657cc3890] crash_kexec at ffffffff810b53f8
 #2 [ffff880657cc3960] oops_end at ffffffff8165e528
 #3 [ffff880657cc3990] die at ffffffff810178d8
 #4 [ffff880657cc39c0] do_trap at ffffffff8165de94
 #5 [ffff880657cc3a20] do_invalid_op at ffffffff81014f65
 #6 [ffff880657cc3ac0] invalid_op at ffffffff8166796b
    [exception RIP: slab_node+0x2e]
    RIP: ffffffff8115a66e  RSP: ffff880657cc3b70  RFLAGS: 00010097
    RAX: 0000000000000000  RBX: ffff880657802c00  RCX: 00000000e62f6aef
    RDX: 0000000000000000  RSI: 0000000000000020  RDI: ffff880abf18a288
    RBP: ffff880657cc3b80   R8: 0000000000000001   R9: 0000000100100010
    R10: 0000000000000000  R11: 0000000000000022  R12: 0000000000000002
    R13: 0000000000000000  R14: 00000000ffffffff  R15: 0000000000000020
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0
 #8 [ffff880657cc3c18] __slab_alloc at ffffffff816498cf
 #9 [ffff880657cc3cc8] __kmalloc_node_track_caller at ffffffff81166f07
#10 [ffff880657cc3d38] __alloc_skb at ffffffff815364c8
#11 [ffff880657cc3d88] __netdev_alloc_skb at ffffffff81536b14
#12 [ffff880657cc3da8] enic_rq_alloc_buf at ffffffffa005484c [enic]
#13 [ffff880657cc3e08] enic_poll_msix at ffffffffa00559ff [enic]
#14 [ffff880657cc3e58] net_rx_action at ffffffff81545274
#15 [ffff880657cc3ec8] __do_softirq at ffffffff8106f5f8
#16 [ffff880657cc3f38] call_softirq at ffffffff81667bec
#17 [ffff880657cc3f50] do_softirq at ffffffff81016305
#18 [ffff880657cc3f70] irq_exit at ffffffff8106f9de
#19 [ffff880657cc3f80] do_IRQ at ffffffff816684a3
--- <IRQ stack> ---
#20 [ffff880544d8bd48] ret_from_intr at ffffffff8165d82e
    [exception RIP: __slab_free+0x2e1]
    RIP: ffffffff81649467  RSP: ffff880544d8bdf8  RFLAGS: 00000202
    RAX: 0000000000000001  RBX: ffffffffff0a0210  RCX: 0000000180aa00a9
    RDX: 0000000180aa00aa  RSI: ffffea002afc6201  RDI: ffff880657806200
    RBP: ffff880544d8bea8   R8: 0000000000000001   R9: 0000000000000000
    R10: ffff8800874be020  R11: ffff8800874be030  R12: ffff880544d8be33
    R13: 000000000000000d  R14: ffffffff81191895  R15: ffff880544d8bdb8
    ORIG_RAX: ffffffffffffff54  CS: 0010  SS: 0018
#21 [ffff880544d8be30] __change_pid at ffffffff81087dca
#22 [ffff880544d8beb0] kmem_cache_free at ffffffff81163634
#23 [ffff880544d8bef0] __mpol_put at ffffffff81159937
#24 [ffff880544d8bf00] do_exit at ffffffff8106c75c
#25 [ffff880544d8bf70] sys_exit at ffffffff8106caf7
#26 [ffff880544d8bf80] system_call_fastpath at ffffffff81665982
    RIP: 00007f6f476b8f37  RSP: 00007f68cbcfdbb0  RFLAGS: 00000202
    RAX: 000000000000003c  RBX: ffffffff81665982  RCX: ffffffffffffffff
    RDX: 00007f68cbcfe700  RSI: 00007f6f478c9250  RDI: 0000000000000000
    RBP: 0000000000000000   R8: 00007f68cbcfe700   R9: 00007f68e82a0370
    R10: 000000007fffffff  R11: 0000000000000246  R12: ffffffff8106caf7
    R13: ffff880544d8bf78  R14: 0000000000000003  R15: 00007f68f8744a10
    ORIG_RAX: 000000000000003c  CS: 0033  SS: 002b

And the register information :

[127932.906804] ------------[ cut here ]------------
[127932.907100] 
[127932.907502] invalid opcode: 0000 [#1] SMP 
[127932.907774] CPU 12 
[127932.907901] Modules linked in: vesafb joydev iomemory_vsl(P) i7core_edac 
ioatdma edac_core dca tpm_tis acpi_power_meter mac_hid lp parport
 usbhid hid usb_storage mptsas enic mptscsih mptbase scsi_transport_sas
[127932.909246] 
[127932.909340] Pid: 21767, comm: mongod Tainted: P           O 
3.2.0-38-generic #61-Ubuntu Cisco Systems Inc UCSC-BSE-SFF-C200/UCSC-BSE-SFF-C
200
[127932.910150] RIP: 0010:[<ffffffff8115a66e>]  [<ffffffff8115a66e>] 
slab_node+0x2e/0xa0
[127932.910614] RSP: 0018:ffff880657cc3b70  EFLAGS: 00010097
[127932.910921] RAX: 0000000000000000 RBX: ffff880657802c00 RCX: 
00000000e62f6aef
[127932.911344] RDX: 0000000000000000 RSI: 0000000000000020 RDI: 
ffff880abf18a288
[127932.911763] RBP: ffff880657cc3b80 R08: 0000000000000001 R09: 
0000000100100010
[127932.912184] R10: 0000000000000000 R11: 0000000000000022 R12: 
0000000000000002
[127932.912625] R13: 0000000000000000 R14: 00000000ffffffff R15: 
0000000000000020
[127932.913126] FS:  0000000000000000(0000) GS:ffff880657cc0000(0000) 
knlGS:0000000000000000
[127932.913621] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[127932.913986] CR2: 00007fb8e6d01218 CR3: 00000006468d2000 CR4: 
00000000000006e0
[127932.936937] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[127932.960466] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[127932.984295] Process mongod (pid: 21767, threadinfo ffff880544d8a000, task 
ffff8800874bdc00)
[127933.008400] Stack:
[127933.019992]  ffffffff8188ee40 ffff880657cd6720 ffff880657cc3c10 
ffffffff816496a0
[127933.043734]  ffff880657801e80 ffff8800874bdc00 ffff8800874bdc00 
0000000077f9e84e
[127933.068114]  ffff880657cd6700 0000000000000008 ffff880657cc3bf0 
ffff8800874bdc00
[127933.092225] Call Trace:
[127933.103955]  <IRQ> 

The current task is actually in the process of exiting :

 sys_exit -> do_exit

The execution of __mpol_put() is in the process of destructing the mempolicy 
structure of the current task
:
#ifdef CONFIG_NUMA
        task_lock(tsk);
        mpol_put(tsk->mempolicy);
        tsk->mempolicy = NULL;
        task_unlock(tsk);
#endif


crash> task | grep mempolicy
  mempolicy = 0xffff880abf18a288, 

crash> * mempolicy 0xffff880abf18a288
struct mempolicy {
  refcnt = {
    counter = 0x0
  }, 
  mode = 0x0, 
  flags = 0x0, 
  v = {
    preferred_node = 0x3, 
    nodes = {
      bits = {0x3}
    }
  }, 
  w = {
    cpuset_mems_allowed = {
      bits = {0x3}
    }, 
    user_nodemask = {
      bits = {0x3}
    }
  }
}

/* Slow path of a mpol destructor. */
void __mpol_put(struct mempolicy *p)
{
        if (!atomic_dec_and_test(&p->refcnt))
                return;
        kmem_cache_free(policy_cache, p);
}

The p->refcnt has already been decreased and kmem_cache_free is being
performed to release the mempolicy element :

The IRQ is triggered while __slab_free() is being executed :

#20 [ffff880544d8bd48] ret_from_intr at ffffffff8165d82e
    [exception RIP: __slab_free+0x2e1]

The slab where the eement is being freed is :

RDI: ffff880657806200

crash> kmem -s 0xffff880abf18a288
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
ffff880657806200 numa_policy               24      41373     43180    254     4k
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffea002afc6280  ffff880abf18a000     1    169        169     0
  FREE / [ALLOCATED]
  [ffff880abf18a288]


which is indeed the slab where the mempolicy is 


__slab_free+0x2e1 is at the very end of __slab_free() :

Looking at the RIP where __slab_free should return, we can see that
__slab_free was about to return when the IRQ came in :

crash> dis -l __slab_free
....
0xffffffff81649457 <__slab_free+0x2d1>: call   0xffffffff8165d590 
<_raw_spin_unlock_irqrestore>
/build/buildd/linux-3.2.0/mm/slub.c: 2517
0xffffffff8164945c <__slab_free+0x2d6>: mov    rsi,r12
0xffffffff8164945f <__slab_free+0x2d9>: mov    rdi,r13
0xffffffff81649462 <__slab_free+0x2dc>: call   0xffffffff81162f90 <discard_slab>
/build/buildd/linux-3.2.0/mm/slub.c: 2518
0xffffffff81649467 <__slab_free+0x2e1>: add    rsp,0x88

        spin_unlock_irqrestore(&n->list_lock, flags);
        stat(s, FREE_SLAB);
        discard_slab(s, page);
}

So most probably the mempolicy page has now been freed.

The IRQ processing now happens and goes through many functions until it
needs to allocate a new slab element for an skb (network buffer) :

    [exception RIP: slab_node+0x2e]
    RIP: ffffffff8115a66e  RSP: ffff880657cc3b70  RFLAGS: 00010097
    RAX: 0000000000000000  RBX: ffff880657802c00  RCX: 00000000e62f6aef
    RDX: 0000000000000000  RSI: 0000000000000020  RDI: ffff880abf18a288
    RBP: ffff880657cc3b80   R8: 0000000000000001   R9: 0000000100100010
    R10: 0000000000000000  R11: 0000000000000022  R12: 0000000000000002
    R13: 0000000000000000  R14: 00000000ffffffff  R15: 0000000000000020
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0
 #8 [ffff880657cc3c18] __slab_alloc at ffffffff816498cf
 #9 [ffff880657cc3cc8] __kmalloc_node_track_caller at ffffffff81166f07
#10 [ffff880657cc3d38] __alloc_skb at ffffffff815364c8

__slab_alloc (from slab_alloc) gets called with a kmem_cache structure
retrieved from get_slab()

crash> bt -F
 #7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0
    ffff880657cc3b90: [kmem_cache_node] [task_struct]    
    ffff880657cc3ba0: [task_struct]    0000000077f9e84e 
    ffff880657cc3bb0: ffff880657cd6700 0000000000000008 
    ffff880657cc3bc0: ffff880657cc3bf0 [task_struct]    
    ffff880657cc3bd0: [kmalloc-2048]   [kmalloc-8192]   
    ffff880657cc3be0: 0000000000000008 ffff880657cd6700 
    ffff880657cc3bf0: 0000000000000000 ffff880657802c00 
    ffff880657cc3c00: 00000000ffffffff 0000000000000282 
    ffff880657cc3c10: ffff880657cc3cc0 __slab_alloc+0x14b 
 #8 [ffff880657cc3c18] __slab_alloc at ffffffff816498cf

crash> bt -f
 #7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0
    ffff880657cc3b90: ffff880657801e80 ffff8800874bdc00 
    ffff880657cc3ba0: ffff8800874bdc00 0000000077f9e84e 
    ffff880657cc3bb0: ffff880657cd6700 0000000000000008 
    ffff880657cc3bc0: ffff880657cc3bf0 ffff8800874bdc00 
    ffff880657cc3bd0: ffff880877f9e84e ffff880c462e0000 
    ffff880657cc3be0: 0000000000000008 ffff880657cd6700 
    ffff880657cc3bf0: 0000000000000000 ffff880657802c00 
    ffff880657cc3c00: 00000000ffffffff 0000000000000282 
    ffff880657cc3c10: ffff880657cc3cc0 ffffffff816498cf 

ffff880657cc3b90: [kmem_cache_node] = ffff880657801e80

But while executing get_any_partial which is only NUMA specific #IFDEF code,
it does the following :

zonelist = node_zonelist(slab_node(current->mempolicy), flags);

current being the current task pointer, it tries does the slab_node() on
the mempolicy that was recently freed prior to the IRQ hitting after
discard_slab

This mempolicy has now been freed, which most probably explains why the
value of mode is 0x0 :

crash> * mempolicy.mode ffff880abf18a288
  mode = 0x0

The behavior for slab_node when mempolicy.mode = 0x0 MPOL_DEFAULT is :

        default:
                BUG();
which triggers the panic.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1233175

Title:
  Kernel panic : mempolicy potential use-after-free on server running
  mongodb

Status in “linux” package in Ubuntu:
  Triaged
Status in “linux” source package in Precise:
  Triaged

Bug description:
  PID: 21767 TASK: ffff8800874bdc00 CPU: 12 COMMAND: "mongod"
   #0 [ffff880657cc3820] machine_kexec at ffffffff810393da
   #1 [ffff880657cc3890] crash_kexec at ffffffff810b53f8
   #2 [ffff880657cc3960] oops_end at ffffffff8165e528
   #3 [ffff880657cc3990] die at ffffffff810178d8
   #4 [ffff880657cc39c0] do_trap at ffffffff8165de94
   #5 [ffff880657cc3a20] do_invalid_op at ffffffff81014f65
   #6 [ffff880657cc3ac0] invalid_op at ffffffff8166796b
      [exception RIP: slab_node+46]
      RIP: ffffffff8115a66e RSP: ffff880657cc3b70 RFLAGS: 00010097
      RAX: 0000000000000000 RBX: ffff880657802c00 RCX: 00000000e62f6aef
      RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffff880abf18a288
      RBP: ffff880657cc3b80 R8: 0000000000000001 R9: 0000000100100010
      R10: 0000000000000000 R11: 0000000000000022 R12: 0000000000000002
      R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000020
      ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
   #7 [ffff880657cc3b88] get_any_partial at ffffffff816496a0
   #8 [ffff880657cc3c18] __slab_alloc at ffffffff816498cf
   #9 [ffff880657cc3cc8] __kmalloc_node_track_caller at ffffffff81166f07
  #10 [ffff880657cc3d38] __alloc_skb at ffffffff815364c8
  #11 [ffff880657cc3d88] __netdev_alloc_skb at ffffffff81536b14
  #12 [ffff880657cc3da8] enic_rq_alloc_buf at ffffffffa005484c [enic]
  #13 [ffff880657cc3e08] enic_poll_msix at ffffffffa00559ff [enic]
  #14 [ffff880657cc3e58] net_rx_action at ffffffff81545274
  #15 [ffff880657cc3ec8] __do_softirq at ffffffff8106f5f8
  #16 [ffff880657cc3f38] call_softirq at ffffffff81667bec
  #17 [ffff880657cc3f50] do_softirq at ffffffff81016305
  #18 [ffff880657cc3f70] irq_exit at ffffffff8106f9de
  #19 [ffff880657cc3f80] do_IRQ at ffffffff816684a3
  --- <IRQ stack> ---
  #20 [ffff880544d8bd48] ret_from_intr at ffffffff8165d82e
      [exception RIP: __slab_free+737]
      RIP: ffffffff81649467 RSP: ffff880544d8bdf8 RFLAGS: 00000202
      RAX: 0000000000000001 RBX: ffffffffff0a0210 RCX: 0000000180aa00a9
      RDX: 0000000180aa00aa RSI: ffffea002afc6201 RDI: ffff880657806200
      RBP: ffff880544d8bea8 R8: 0000000000000001 R9: 0000000000000000
      R10: ffff8800874be020 R11: ffff8800874be030 R12: ffff880544d8be33
      R13: 000000000000000d R14: ffffffff81191895 R15: ffff880544d8bdb8
      ORIG_RAX: ffffffffffffff54 CS: 0010 SS: 0018
  #21 [ffff880544d8be30] __change_pid at ffffffff81087dca
  #22 [ffff880544d8beb0] kmem_cache_free at ffffffff81163634
  #23 [ffff880544d8bef0] __mpol_put at ffffffff81159937
  #24 [ffff880544d8bf00] do_exit at ffffffff8106c75c
  #25 [ffff880544d8bf70] sys_exit at ffffffff8106caf7
  #26 [ffff880544d8bf80] system_call_fastpath at ffffffff81665982
      RIP: 00007f6f476b8f37 RSP: 00007f68cbcfdbb0 RFLAGS: 00000202
      RAX: 000000000000003c RBX: ffffffff81665982 RCX: ffffffffffffffff
      RDX: 00007f68cbcfe700 RSI: 00007f6f478c9250 RDI: 0000000000000000
      RBP: 0000000000000000 R8: 00007f68cbcfe700 R9: 00007f68e82a0370
      R10: 000000007fffffff R11: 0000000000000246 R12: ffffffff8106caf7
      R13: ffff880544d8bf78 R14: 0000000000000003 R15: 00007f68f8744a10
      ORIG_RAX: 000000000000...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1233175/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to