What kernel and ofed version is it? Jack Morgenstein wrote:
We saw the following kernel panic when testing ipoib stability intensively by simultaneously (i.e., in separate processes, with random wait intervals) doing: - ifconfig up/down - opensm up/down - ipoib ping - arp delete - driver up/downib0: ib_sa_path_rec_get failed: -11 ib0: ib_sa_path_rec_get failed: -11 Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff883ac404>] :ib_ipoib:ipoib_mark_paths_invalid+0xbc/0xec PGD 224ea0067 PUD 225ae9067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /class/infiniband/mlx4_0/ports/2/pkeys/0 CPU 2 Modules linked in: netconsole nfsd exportfs autofs4 hidp nfs lockd fscache nfs_acl rfcomm l2cap bluetooth sunrpc rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mthca(U) ib_mad(U) ib_core(U) dm_mirror dm_mod video sbs i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport mlx4_core(U) ide_cd sg k8_edac cdrom edac_mc bnx2 shpchp serio_raw pcspkr sata_svw libata megaraid_sas sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 2051, comm: ipoib Not tainted 2.6.18-8.el5 #1 RIP: 0010:[<ffffffff883ac404>] [<ffffffff883ac404>] :ib_ipoib:ipoib_mark_paths_invalid+0xbc/0xec RSP: 0018:ffff810121ee7de0 EFLAGS: 00010046 RAX: ffff810121ee8538 RBX: ffffffffffffff30 RCX: 0000000000000002 RDX: ffff8102237a1f90 RSI: ffff8102261e90c0 RDI: ffff810121ee8500 RBP: ffff810121ee8500 R08: ffff810121ee6000 R09: 0000000000000000 R10: ffff810005116400 R11: 0000000000000002 R12: ffffffffffffff30 R13: 0000000000000000 R14: ffff810121ee8688 R15: ffffffff883ae8b3 FS: 00002aaaaaace2a0(0000) GS:ffff810127c4f3c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000224eef000 CR4: 00000000000006e0 Process ipoib (pid: 2051, threadinfo ffff810121ee6000, task ffff810227ebb860) Stack: ffff810121ee8500 ffff810121ee84f0 ffff810121ee8000 ffffffff883ae850 ffffffffffffffff 7fffffffffffffff ffffffffffffffff ffff810121ee8688 ffff810121ee8690 ffff810125d932c0 0000000000000282 ffffffff8004b2b4 Call Trace: [<ffffffff883ae850>] :ib_ipoib:__ipoib_ib_dev_flush+0x175/0x1b6 [<ffffffff8004b2b4>] run_workqueue+0x94/0xe5 [<ffffffff80047c13>] worker_thread+0x0/0x122 [<ffffffff8009b4a3>] keventd_create_kthread+0x0/0x61 [<ffffffff80047d03>] worker_thread+0xf0/0x122 [<ffffffff80086c5f>] default_wake_function+0x0/0xe [<ffffffff8009b4a3>] keventd_create_kthread+0x0/0x61 [<ffffffff8009b4a3>] keventd_create_kthread+0x0/0x61 [<ffffffff8003216e>] kthread+0xfe/0x132 [<ffffffff8005bfe5>] child_rip+0xa/0x11 [<ffffffff8009b4a3>] keventd_create_kthread+0x0/0x61 [<ffffffff80032070>] kthread+0x0/0x132 [<ffffffff8005bfdb>] child_rip+0x0/0x11 Code: 4d 8b a4 24 d0 00 00 00 48 8d 93 d0 00 00 00 48 8d 45 38 49 RIP [<ffffffff883ac404>] :ib_ipoib:ipoib_mark_paths_invalid+0xbc/0xec RSP <ffff810121ee7de0> CR2: 0000000000000000 <0>Kernel panic - not syncing: Fatal exception In objdump -ld, we get: ipoib_mark_paths_invalid(): /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_main.c:365 13f7: c7 83 e0 00 00 00 00 movl $0x0,0xe0(%rbx) 13fe: 00 00 00 /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_main.c:361 1401: 4c 89 e3 mov %r12,%rbx ==> 1404: 4d 8b a4 24 d0 00 00 mov 0xd0(%r12),%r12 140b: 00 140c: 48 8d 93 d0 00 00 00 lea 0xd0(%rbx),%rdx 1413: 48 8d 45 38 lea 0x38(%rbp),%rax 1417: 49 81 ec d0 00 00 00 sub $0xd0,%r12 141e: 48 39 c2 cmp %rax,%rdx 1421: 0f 85 4b ff ff ff jne 1372 <ipoib_mark_paths_invalid+0x2a> -------------------------------- and in the source code, we get: void ipoib_mark_paths_invalid(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_path *path, *tp; spin_lock_irq(&priv->lock); ==> list_for_each_entry_safe(path, tp, &priv->path_list, list) { ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " invalid\n", be16_to_cpu(path->pathrec.dlid), IPOIB_GID_ARG(path->pathrec.dgid)); path->valid = 0; } spin_unlock_irq(&priv->lock); } -------------------------------------------- Any ideas? - Jack _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
-- --Yossi _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
