Hi Or,

We didn't change that code  as our code is  sitting above the rdma_ucm  bit.  ( 
we do not touch any of the core RDMA function or drivers, just using them).
We are using the default OFED setup ( driver are loaded with the default config 
) and there is nothing special .
I will investigate the MAX_ORDER  aspect asap and test with 3.9rc1 also. 

However I did a quick test and by removing physically HALF the ram of the 
server ( basically moving from 1TB to 512GB) everything works fine.. 


Regards
Benoit



> -----Original Message-----
> From: Or Gerlitz [mailto:[email protected]]
> Sent: 07 March 2013 15:34
> To: Hudzia, Benoit
> Cc: [email protected]; Jack Morgenstein
> Subject: Re: mlx4 module loading fail
> 
> On 07/03/2013 13:18, Hudzia, Benoit wrote:
> > I am currently experiencing some trouble with my connectx2 cards. I  have
> been doing test with smallish server without any problem and this week I
> upgraded to a more beefier option. However I fail to be able setup the IB
> card with our current kernel.
> > The servers spec are as follow:
> >     * 4x 10 core Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz stepping 02
> >     * 1TB of RAM
> >     * 1 connectx2 IB
> >
> > Kernel Version : 3.5.0 Note if I downgrade to a 3.2 kernel I do not
> experience this issue. However I am forced to work with a 3.5 or higher. Can
> somebody help me with that?
> 
> Hi Benoit,
> 
> As was suggested here can you try 3.8 or 3.9-rc1, this will help a lot
> to isolate the problem, but even before that, the warning you are
> getting is as of
> allocation with order > MAX_ORDER, what's MAX_ORDER under your
> configuration and what value do you provide to mlx4_buddy_init from
> mlx4_init_mr_table (did you modify that code?)
> 
> Or.
> 
> >
> > Kernel log trace:
> >
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423038] ------------[ cut here 
> > ]---------
> ---
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423049] WARNING: at
> mm/page_alloc.c:2298 __alloc_pages_nodemask+0x2b9/0x810()
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423050] Hardware name: QSSC-S4R
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423051] Modules linked in: joydev
> coretemp kvm_intel kvm microcode pcspkr ixgbe mlx4_core(+) igb mdio
> ioatdma i2c_i801 hid_generic lpc_ich i2c_core mfd_core dca tpm_tis tpm
> tpm_bios acpi_memhotpl
> > ug evbug crc32c_intel megaraid_sas usbhid hid
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423078] Pid: 949, comm: modprobe
> Not tainted 3.5.0-heca-dev-34dd48a+ #29
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423079] Call Trace:
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423088]  [<ffffffff8104baef>]
> warn_slowpath_common+0x7f/0xc0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423091]  [<ffffffff8104bb4a>]
> warn_slowpath_null+0x1a/0x20
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423093]  [<ffffffff811028b9>]
> __alloc_pages_nodemask+0x2b9/0x810
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423096]  [<ffffffff81102785>] ?
> __alloc_pages_nodemask+0x185/0x810
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423101]  [<ffffffff81137086>]
> alloc_pages_current+0xb6/0x120
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423105]  [<ffffffff810fe02e>]
> __get_free_pages+0xe/0x40
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423108]  [<ffffffff8113fcff>]
> kmalloc_order_trace+0x3f/0xd0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423110]  [<ffffffff810fe02e>] ?
> __get_free_pages+0xe/0x40
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423113]  [<ffffffff811405e0>]
> __kmalloc+0x100/0x160
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423131]  [<ffffffffa01ba35d>]
> mlx4_buddy_init+0xed/0x1a0 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423140]  [<ffffffffa01bb8aa>]
> mlx4_init_mr_table+0xca/0x150 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423148]  [<ffffffffa01b6fa7>]
> mlx4_setup_hca.part.12+0xf7/0x4e0 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423156]  [<ffffffffa01aaeef>] ?
> mlx4_bitmap_init+0x8f/0xb0 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423164]  [<ffffffffa01b73bb>]
> mlx4_setup_hca+0x2b/0x70 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423172]  [<ffffffffa01b7ba4>]
> __mlx4_init_one+0x744/0x960 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423179]  [<ffffffffa01c55b6>]
> mlx4_init_one+0x3d/0x42 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423186]  [<ffffffff812e6e56>]
> pci_call_probe+0x96/0xb0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423189]  [<ffffffff812e8019>]
> pci_device_probe+0x79/0xa0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423194]  [<ffffffff813894fa>] ?
> driver_sysfs_add+0x7a/0xb0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423196]  [<ffffffff813896b8>]
> really_probe+0x68/0x200
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423198]  [<ffffffff81389982>]
> driver_probe_device+0x22/0x30
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423200]  [<ffffffff81389a3b>]
> __driver_attach+0xab/0xb0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423202]  [<ffffffff81389990>] ?
> driver_probe_device+0x30/0x30
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423205]  [<ffffffff81387c46>]
> bus_for_each_dev+0x56/0x90
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423207]  [<ffffffff813892fe>]
> driver_attach+0x1e/0x20
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423210]  [<ffffffff81388ed0>]
> bus_add_driver+0x1a0/0x270
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423216]  [<ffffffffa01d2031>] ?
> mlx4_catas_init+0x31/0x31 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423218]  [<ffffffff81389f86>]
> driver_register+0x76/0x130
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423223]  [<ffffffff8157aa9d>] ?
> notifier_call_chain+0x4d/0x70
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423227]  [<ffffffff8109f0b0>] ?
> add_kallsyms+0x1e0/0x1e0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423233]  [<ffffffffa01d2031>] ?
> mlx4_catas_init+0x31/0x31 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423235]  [<ffffffff812e7d85>]
> __pci_register_driver+0x55/0xd0
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423241]  [<ffffffffa01d2031>] ?
> mlx4_catas_init+0x31/0x31 [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423246]  [<ffffffffa01d20dd>]
> mlx4_init+0xac/0xec [mlx4_core]
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423250]  [<ffffffff8100203f>]
> do_one_initcall+0x3f/0x170
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423253]  [<ffffffff810a18bf>]
> sys_init_module+0x8f/0x200
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423257]  [<ffffffff8157f0a9>]
> system_call_fastpath+0x16/0x1b
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423259] ---[ end trace
> 8886e8f0c535939d ]---
> > Mar  7 03:12:27 bi-heca-02 kernel: [    7.423263] mlx4_core 0000:86:00.0:
> Failed to initialize memory region table, aborting.
> > Mar  7 03:12:27 bi-heca-02 kernel: [    8.431444] mlx4_core: probe of
> 0000:86:00.0 failed with error -12
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to