Hi Or,
We didn't change that code as our code is sitting above the rdma_ucm bit. ( we do not touch any of the core RDMA function or drivers, just using them). We are using the default OFED setup ( driver are loaded with the default config ) and there is nothing special . I will investigate the MAX_ORDER aspect asap and test with 3.9rc1 also. However I did a quick test and by removing physically HALF the ram of the server ( basically moving from 1TB to 512GB) everything works fine.. Regards Benoit > -----Original Message----- > From: Or Gerlitz [mailto:[email protected]] > Sent: 07 March 2013 15:34 > To: Hudzia, Benoit > Cc: [email protected]; Jack Morgenstein > Subject: Re: mlx4 module loading fail > > On 07/03/2013 13:18, Hudzia, Benoit wrote: > > I am currently experiencing some trouble with my connectx2 cards. I have > been doing test with smallish server without any problem and this week I > upgraded to a more beefier option. However I fail to be able setup the IB > card with our current kernel. > > The servers spec are as follow: > > * 4x 10 core Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz stepping 02 > > * 1TB of RAM > > * 1 connectx2 IB > > > > Kernel Version : 3.5.0 Note if I downgrade to a 3.2 kernel I do not > experience this issue. However I am forced to work with a 3.5 or higher. Can > somebody help me with that? > > Hi Benoit, > > As was suggested here can you try 3.8 or 3.9-rc1, this will help a lot > to isolate the problem, but even before that, the warning you are > getting is as of > allocation with order > MAX_ORDER, what's MAX_ORDER under your > configuration and what value do you provide to mlx4_buddy_init from > mlx4_init_mr_table (did you modify that code?) > > Or. > > > > > Kernel log trace: > > > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423038] ------------[ cut here > > ]--------- > --- > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423049] WARNING: at > mm/page_alloc.c:2298 __alloc_pages_nodemask+0x2b9/0x810() > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423050] Hardware name: QSSC-S4R > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423051] Modules linked in: joydev > coretemp kvm_intel kvm microcode pcspkr ixgbe mlx4_core(+) igb mdio > ioatdma i2c_i801 hid_generic lpc_ich i2c_core mfd_core dca tpm_tis tpm > tpm_bios acpi_memhotpl > > ug evbug crc32c_intel megaraid_sas usbhid hid > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423078] Pid: 949, comm: modprobe > Not tainted 3.5.0-heca-dev-34dd48a+ #29 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423079] Call Trace: > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423088] [<ffffffff8104baef>] > warn_slowpath_common+0x7f/0xc0 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423091] [<ffffffff8104bb4a>] > warn_slowpath_null+0x1a/0x20 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423093] [<ffffffff811028b9>] > __alloc_pages_nodemask+0x2b9/0x810 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423096] [<ffffffff81102785>] ? > __alloc_pages_nodemask+0x185/0x810 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423101] [<ffffffff81137086>] > alloc_pages_current+0xb6/0x120 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423105] [<ffffffff810fe02e>] > __get_free_pages+0xe/0x40 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423108] [<ffffffff8113fcff>] > kmalloc_order_trace+0x3f/0xd0 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423110] [<ffffffff810fe02e>] ? > __get_free_pages+0xe/0x40 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423113] [<ffffffff811405e0>] > __kmalloc+0x100/0x160 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423131] [<ffffffffa01ba35d>] > mlx4_buddy_init+0xed/0x1a0 [mlx4_core] > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423140] [<ffffffffa01bb8aa>] > mlx4_init_mr_table+0xca/0x150 [mlx4_core] > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423148] [<ffffffffa01b6fa7>] > mlx4_setup_hca.part.12+0xf7/0x4e0 [mlx4_core] > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423156] [<ffffffffa01aaeef>] ? > mlx4_bitmap_init+0x8f/0xb0 [mlx4_core] > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423164] [<ffffffffa01b73bb>] > mlx4_setup_hca+0x2b/0x70 [mlx4_core] > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423172] [<ffffffffa01b7ba4>] > __mlx4_init_one+0x744/0x960 [mlx4_core] > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423179] [<ffffffffa01c55b6>] > mlx4_init_one+0x3d/0x42 [mlx4_core] > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423186] [<ffffffff812e6e56>] > pci_call_probe+0x96/0xb0 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423189] [<ffffffff812e8019>] > pci_device_probe+0x79/0xa0 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423194] [<ffffffff813894fa>] ? > driver_sysfs_add+0x7a/0xb0 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423196] [<ffffffff813896b8>] > really_probe+0x68/0x200 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423198] [<ffffffff81389982>] > driver_probe_device+0x22/0x30 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423200] [<ffffffff81389a3b>] > __driver_attach+0xab/0xb0 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423202] [<ffffffff81389990>] ? > driver_probe_device+0x30/0x30 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423205] [<ffffffff81387c46>] > bus_for_each_dev+0x56/0x90 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423207] [<ffffffff813892fe>] > driver_attach+0x1e/0x20 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423210] [<ffffffff81388ed0>] > bus_add_driver+0x1a0/0x270 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423216] [<ffffffffa01d2031>] ? > mlx4_catas_init+0x31/0x31 [mlx4_core] > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423218] [<ffffffff81389f86>] > driver_register+0x76/0x130 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423223] [<ffffffff8157aa9d>] ? > notifier_call_chain+0x4d/0x70 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423227] [<ffffffff8109f0b0>] ? > add_kallsyms+0x1e0/0x1e0 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423233] [<ffffffffa01d2031>] ? > mlx4_catas_init+0x31/0x31 [mlx4_core] > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423235] [<ffffffff812e7d85>] > __pci_register_driver+0x55/0xd0 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423241] [<ffffffffa01d2031>] ? > mlx4_catas_init+0x31/0x31 [mlx4_core] > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423246] [<ffffffffa01d20dd>] > mlx4_init+0xac/0xec [mlx4_core] > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423250] [<ffffffff8100203f>] > do_one_initcall+0x3f/0x170 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423253] [<ffffffff810a18bf>] > sys_init_module+0x8f/0x200 > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423257] [<ffffffff8157f0a9>] > system_call_fastpath+0x16/0x1b > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423259] ---[ end trace > 8886e8f0c535939d ]--- > > Mar 7 03:12:27 bi-heca-02 kernel: [ 7.423263] mlx4_core 0000:86:00.0: > Failed to initialize memory region table, aborting. > > Mar 7 03:12:27 bi-heca-02 kernel: [ 8.431444] mlx4_core: probe of > 0000:86:00.0 failed with error -12 > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
