Hi,

I am currently experiencing some trouble with my connectx2 cards. 

I  have been doing test with smallish server without any problem and this week 
I upgraded to a more beefier option. However I fail to be able setup the IB 
card with our current kernel .


The servers spec are as follow: 
        * 4x 10 core Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz stepping 02
        * 1TB of RAM 
        * 1 connectx2 IB 

Kernel Version : 3.5.0 

Note if I downgrade to a 3.2 kernel I do not experience this issue. However I 
am forced to work with a 3.5 or higher. Can somebody help me with that? 
Thanks 
Benoit

Kernel log trace: 

Mar  7 03:12:27 bi-heca-02 kernel: [    7.423038] ------------[ cut here 
]------------
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423049] WARNING: at 
mm/page_alloc.c:2298 __alloc_pages_nodemask+0x2b9/0x810()
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423050] Hardware name: QSSC-S4R
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423051] Modules linked in: joydev 
coretemp kvm_intel kvm microcode pcspkr ixgbe mlx4_core(+) igb mdio ioatdma 
i2c_i801 hid_generic lpc_ich i2c_core mfd_core dca tpm_tis tpm tpm_bios 
acpi_memhotpl
ug evbug crc32c_intel megaraid_sas usbhid hid
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423078] Pid: 949, comm: modprobe Not 
tainted 3.5.0-heca-dev-34dd48a+ #29
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423079] Call Trace:
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423088]  [<ffffffff8104baef>] 
warn_slowpath_common+0x7f/0xc0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423091]  [<ffffffff8104bb4a>] 
warn_slowpath_null+0x1a/0x20
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423093]  [<ffffffff811028b9>] 
__alloc_pages_nodemask+0x2b9/0x810
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423096]  [<ffffffff81102785>] ? 
__alloc_pages_nodemask+0x185/0x810
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423101]  [<ffffffff81137086>] 
alloc_pages_current+0xb6/0x120
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423105]  [<ffffffff810fe02e>] 
__get_free_pages+0xe/0x40
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423108]  [<ffffffff8113fcff>] 
kmalloc_order_trace+0x3f/0xd0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423110]  [<ffffffff810fe02e>] ? 
__get_free_pages+0xe/0x40
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423113]  [<ffffffff811405e0>] 
__kmalloc+0x100/0x160
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423131]  [<ffffffffa01ba35d>] 
mlx4_buddy_init+0xed/0x1a0 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423140]  [<ffffffffa01bb8aa>] 
mlx4_init_mr_table+0xca/0x150 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423148]  [<ffffffffa01b6fa7>] 
mlx4_setup_hca.part.12+0xf7/0x4e0 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423156]  [<ffffffffa01aaeef>] ? 
mlx4_bitmap_init+0x8f/0xb0 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423164]  [<ffffffffa01b73bb>] 
mlx4_setup_hca+0x2b/0x70 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423172]  [<ffffffffa01b7ba4>] 
__mlx4_init_one+0x744/0x960 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423179]  [<ffffffffa01c55b6>] 
mlx4_init_one+0x3d/0x42 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423186]  [<ffffffff812e6e56>] 
pci_call_probe+0x96/0xb0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423189]  [<ffffffff812e8019>] 
pci_device_probe+0x79/0xa0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423194]  [<ffffffff813894fa>] ? 
driver_sysfs_add+0x7a/0xb0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423196]  [<ffffffff813896b8>] 
really_probe+0x68/0x200
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423198]  [<ffffffff81389982>] 
driver_probe_device+0x22/0x30
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423200]  [<ffffffff81389a3b>] 
__driver_attach+0xab/0xb0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423202]  [<ffffffff81389990>] ? 
driver_probe_device+0x30/0x30
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423205]  [<ffffffff81387c46>] 
bus_for_each_dev+0x56/0x90
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423207]  [<ffffffff813892fe>] 
driver_attach+0x1e/0x20
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423210]  [<ffffffff81388ed0>] 
bus_add_driver+0x1a0/0x270
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423216]  [<ffffffffa01d2031>] ? 
mlx4_catas_init+0x31/0x31 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423218]  [<ffffffff81389f86>] 
driver_register+0x76/0x130
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423223]  [<ffffffff8157aa9d>] ? 
notifier_call_chain+0x4d/0x70
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423227]  [<ffffffff8109f0b0>] ? 
add_kallsyms+0x1e0/0x1e0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423233]  [<ffffffffa01d2031>] ? 
mlx4_catas_init+0x31/0x31 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423235]  [<ffffffff812e7d85>] 
__pci_register_driver+0x55/0xd0
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423241]  [<ffffffffa01d2031>] ? 
mlx4_catas_init+0x31/0x31 [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423246]  [<ffffffffa01d20dd>] 
mlx4_init+0xac/0xec [mlx4_core]
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423250]  [<ffffffff8100203f>] 
do_one_initcall+0x3f/0x170
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423253]  [<ffffffff810a18bf>] 
sys_init_module+0x8f/0x200
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423257]  [<ffffffff8157f0a9>] 
system_call_fastpath+0x16/0x1b
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423259] ---[ end trace 
8886e8f0c535939d ]---
Mar  7 03:12:27 bi-heca-02 kernel: [    7.423263] mlx4_core 0000:86:00.0: 
Failed to initialize memory region table, aborting.
Mar  7 03:12:27 bi-heca-02 kernel: [    8.431444] mlx4_core: probe of 
0000:86:00.0 failed with error -12



Dr. Benoit Hudzia
Senior Researcher

SAP Next Business and Technology 
SAP (UK) Limited
The Concourse Building 
Queen's Road , Queen's Island, Titanic Quarter
BT3 9TD Belfast
T +44 (0)28 9078 5742
F +44 (0)28 9078  5777
M +44 (0)79 834 46729
mailto:[email protected]
www.sap.com/research

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to