It might be related to new ConnectX card (with mlx4_ib module). Now I tried the same program on a machine with only "mthca" card, it succeeds without any problems.
thanks. I remember one guy in this list also reported a similar issue: ib_phys_reg_mr( ) fails with mlx4 module. On Fri, Feb 20, 2009 at 3:21 AM, Liang Zhen <[email protected]> wrote: > Hmm, I didn't see any problem in your code. Have you installed > ofa_kernel_devel (kernel headers of OFED) after installation of > ofa_kernel_1_3_1? > > Regards > Liang > > neutron: >> >> I'm using Mellanox HCA 'mthca0' type: MT25208, kernel version: >> 2.6.18-53.1.14.el5, ofed 1.3.1. >> >> The failed function call is like: >> >> { >> >> ctx->send_buf = dma_alloc_coherent(ctx->ib_dev->dma_device, MAX_SIZE, >> &dma_addr, GFP_KERNEL); >> >> ctx->phy_buf[0].addr = dma_addr; >> ctx->phy_buf[0].size = MAX_SIZE; >> ctx->iovstart = (u64) ctx->send_buf; >> >> printk("pd=%p, phy_buf[0].addr=%p,size=%d, iovstart=%llx\n", >> ctx->pd, ctx->phy_buf[0].addr, ctx->phy_buf[0].size, ctx->iovstart >> ); >> >> send_mr = ib_reg_phys_mr( ctx->pd, &ctx->phy_buf[0], 1, >> IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_READ >> | IB_ACCESS_LOCAL_WRITE, &(ctx->iovstart)); >> } >> >> The phy_buf[0] is a "ib_phys_buf" corresponding to "ctx->send_buf". >> >> Below is /var/log/messages output around the crash. >> ---------------- >> Feb 19 12:50:22 wci30 kernel: pd=ffff8101da3ddce0, >> phy_buf[0].addr=00000001bbe4b000,size=1024, iovstart=ffff8101bbe4b000 >> >> Feb 19 12:50:22 wci30 kernel: Unable to handle kernel NULL pointer >> dereference at 0000000000000000 >> RIP: >> Feb 19 12:50:22 wci30 kernel: [<0000000000000000>] >> _stext+0x7ffff000/0x1000 >> Feb 19 12:50:22 wci30 kernel: PGD 1c06d5067 PUD 1c9dcd067 PMD 0 >> Feb 19 12:50:22 wci30 kernel: Oops: 0010 [1] SMP >> Feb 19 12:50:22 wci30 kernel: last sysfs file: /module/libata/version >> Feb 19 12:50:22 wci30 kernel: CPU 0 >> Feb 19 12:54:05 wci30 syslogd 1.4.1: restart. >> Feb 19 12:54:05 wci30 kernel: klogd 1.4.1, log source = /proc/kmsg >> started. >> Feb 19 12:54:05 wci30 kernel: Linux version 2.6.18-53.1.14.el5 >> ([email protected] >> t.com) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) #1 SMP Tue Feb >> 19 07:18:46 EST 2008 >> Feb 19 12:54:05 wci30 kernel: Command line: ro root=LABEL=/ rhgb quiet >> >> ==================== >> It's strange that the kernel doesn't print out the function call stack >> before crashing. >> >> Any hints? Thanks a lot! >> >> On Wed, Feb 18, 2009 at 7:40 PM, Roland Dreier <[email protected]> wrote: >> >>> >>> > Before calling ib_reg_phys_mr, printk() shows that all its arguments >>> > are valid. But the system always crashes immediately after entering >>> > the function ib_reg_phys_mr( ). Any possible reasons ? Thanks!! >>> >>> What do you mean by "immediately after entering ib_reg_phys_mr()"? Do >>> you get an oops message? If so that would be very important info for >>> debugging this. >>> >>> - R. >>> >>> >> >> _______________________________________________ >> general mailing list >> [email protected] >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
