Hi,

On 07.03.2013 11:18, Hudzia, Benoit wrote:
> The servers spec are as follow: 
>       * 4x 10 core Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz stepping 02
>       * 1TB of RAM 
>       * 1 connectx2 IB 
> 
> Kernel Version : 3.5.0 
> 
> Note if I downgrade to a 3.2 kernel I do not experience this issue. However I 
> am forced to work with a 3.5 or higher. Can somebody help me with that? 

Probably the commit 89dd86db (mlx4_core: Allow large mlx4_buddy bitmaps),
which is already included in 3.6 or higher, has already fixed the problem.

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit?h=linux-3.6.y&id=89dd86db

Regards,
Dongsu

> Thanks 
> Benoit
> 
> Kernel log trace: 
> 
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423038] ------------[ cut here 
> ]------------
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423049] WARNING: at 
> mm/page_alloc.c:2298 __alloc_pages_nodemask+0x2b9/0x810()
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423050] Hardware name: QSSC-S4R
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423051] Modules linked in: joydev 
> coretemp kvm_intel kvm microcode pcspkr ixgbe mlx4_core(+) igb mdio ioatdma 
> i2c_i801 hid_generic lpc_ich i2c_core mfd_core dca tpm_tis tpm tpm_bios 
> acpi_memhotpl
> ug evbug crc32c_intel megaraid_sas usbhid hid
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423078] Pid: 949, comm: modprobe 
> Not tainted 3.5.0-heca-dev-34dd48a+ #29
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423079] Call Trace:
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423088]  [<ffffffff8104baef>] 
> warn_slowpath_common+0x7f/0xc0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423091]  [<ffffffff8104bb4a>] 
> warn_slowpath_null+0x1a/0x20
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423093]  [<ffffffff811028b9>] 
> __alloc_pages_nodemask+0x2b9/0x810
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423096]  [<ffffffff81102785>] ? 
> __alloc_pages_nodemask+0x185/0x810
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423101]  [<ffffffff81137086>] 
> alloc_pages_current+0xb6/0x120
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423105]  [<ffffffff810fe02e>] 
> __get_free_pages+0xe/0x40
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423108]  [<ffffffff8113fcff>] 
> kmalloc_order_trace+0x3f/0xd0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423110]  [<ffffffff810fe02e>] ? 
> __get_free_pages+0xe/0x40
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423113]  [<ffffffff811405e0>] 
> __kmalloc+0x100/0x160
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423131]  [<ffffffffa01ba35d>] 
> mlx4_buddy_init+0xed/0x1a0 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423140]  [<ffffffffa01bb8aa>] 
> mlx4_init_mr_table+0xca/0x150 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423148]  [<ffffffffa01b6fa7>] 
> mlx4_setup_hca.part.12+0xf7/0x4e0 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423156]  [<ffffffffa01aaeef>] ? 
> mlx4_bitmap_init+0x8f/0xb0 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423164]  [<ffffffffa01b73bb>] 
> mlx4_setup_hca+0x2b/0x70 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423172]  [<ffffffffa01b7ba4>] 
> __mlx4_init_one+0x744/0x960 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423179]  [<ffffffffa01c55b6>] 
> mlx4_init_one+0x3d/0x42 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423186]  [<ffffffff812e6e56>] 
> pci_call_probe+0x96/0xb0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423189]  [<ffffffff812e8019>] 
> pci_device_probe+0x79/0xa0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423194]  [<ffffffff813894fa>] ? 
> driver_sysfs_add+0x7a/0xb0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423196]  [<ffffffff813896b8>] 
> really_probe+0x68/0x200
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423198]  [<ffffffff81389982>] 
> driver_probe_device+0x22/0x30
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423200]  [<ffffffff81389a3b>] 
> __driver_attach+0xab/0xb0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423202]  [<ffffffff81389990>] ? 
> driver_probe_device+0x30/0x30
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423205]  [<ffffffff81387c46>] 
> bus_for_each_dev+0x56/0x90
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423207]  [<ffffffff813892fe>] 
> driver_attach+0x1e/0x20
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423210]  [<ffffffff81388ed0>] 
> bus_add_driver+0x1a0/0x270
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423216]  [<ffffffffa01d2031>] ? 
> mlx4_catas_init+0x31/0x31 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423218]  [<ffffffff81389f86>] 
> driver_register+0x76/0x130
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423223]  [<ffffffff8157aa9d>] ? 
> notifier_call_chain+0x4d/0x70
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423227]  [<ffffffff8109f0b0>] ? 
> add_kallsyms+0x1e0/0x1e0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423233]  [<ffffffffa01d2031>] ? 
> mlx4_catas_init+0x31/0x31 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423235]  [<ffffffff812e7d85>] 
> __pci_register_driver+0x55/0xd0
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423241]  [<ffffffffa01d2031>] ? 
> mlx4_catas_init+0x31/0x31 [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423246]  [<ffffffffa01d20dd>] 
> mlx4_init+0xac/0xec [mlx4_core]
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423250]  [<ffffffff8100203f>] 
> do_one_initcall+0x3f/0x170
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423253]  [<ffffffff810a18bf>] 
> sys_init_module+0x8f/0x200
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423257]  [<ffffffff8157f0a9>] 
> system_call_fastpath+0x16/0x1b
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423259] ---[ end trace 
> 8886e8f0c535939d ]---
> Mar  7 03:12:27 bi-heca-02 kernel: [    7.423263] mlx4_core 0000:86:00.0: 
> Failed to initialize memory region table, aborting.
> Mar  7 03:12:27 bi-heca-02 kernel: [    8.431444] mlx4_core: probe of 
> 0000:86:00.0 failed with error -12
> 
> 
> 
> Dr. Benoit Hudzia
> Senior Researcher
> 
> SAP Next Business and Technology 
> SAP (UK) Limited
> The Concourse Building 
> Queen's Road , Queen's Island, Titanic Quarter
> BT3 9TD Belfast
> T +44 (0)28 9078 5742
> F +44 (0)28 9078  5777
> M +44 (0)79 834 46729
> mailto:[email protected]
> www.sap.com/research
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to