Hi.

When executing SDP tests (stress_connect) i got a kernel oops in my machine in ib_umad:

Here are the machine props:
*************************************************************
Host Name         : sw112/3
Host Architecture : x86_64
Linux Distribution: SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10
Kernel Version    : 2.6.16.21-0.8-smp
GCC Version       : gcc (GCC) 4.1.0 (SUSE Linux)
Memory size       : 4049452 kB
Number of CPUs    : 4
cpu MHz           : 3192.308
MST Version       : 4.4.3
Driver Version    : ofa_1_3_dev-20071126-0855
HCA ID(s)         : mlx4_0
HCA model(s)      : 25418
Board(s)          : MT_04A0110002
*************************************************************

Here is the dump of the /var/log/messages:
Nov 27 09:26:32 sw112 OpenSM[24713]: Exiting SM
Nov 27 09:26:32 sw112 kernel: general protection fault: 0000 [1] SMP
Nov 27 09:26:32 sw112 kernel: last sysfs file: /class/net/ib0/address
Nov 27 09:26:32 sw112 kernel: CPU 2
Nov 27 09:26:32 sw112 kernel: Modules linked in: mst_pciconf mst_pci rdma_ucm rds ib_sdp rdma_cm iw_cm ib_addr ib_ipoib ib_c m ib_sa ib_uverbs ib_umad mlx4_ib mlx4_core ib_mthca ib_mad ib_core memtrack autofs4 ipv6 nfs lockd nfs_acl sunrpc af_packet button battery ac apparmor aamatch_pcre loop dm_mod ide_cd uhci_hcd ehci_hcd cdrom shpchp pci_hotplug hw_random i8xx_tco us bcore e1000 ext3 jbd edd fan thermal processor sg mptspi mptscsih mptbase scsi_transport_spi piix sd_mod scsi_mod ide_disk i
de_core
Nov 27 09:26:32 sw112 kernel: Pid: 24713, comm: opensm Tainted: PF U 2.6.16.21-0.8-smp #1 Nov 27 09:26:32 sw112 kernel: RIP: 0010:[<ffffffff8837d39f>] <ffffffff8837d39f>{:ib_umad:dequeue_send+26}
Nov 27 09:26:32 sw112 kernel: RSP: 0018:ffff8100c0d9fde8  EFLAGS: 00010046
Nov 27 09:26:32 sw112 kernel: RAX: ffff8100c1a95658 RBX: 3f40a6f32b5a2004 RCX: 3f40a6f32b5a2014 Nov 27 09:26:32 sw112 kernel: RDX: ffff8100c0d9fe58 RSI: 3f40a6f32b5a2004 RDI: ffff81007401ac3c Nov 27 09:26:32 sw112 kernel: RBP: 3f40a6f32b5a2004 R08: 0000000000000206 R09: 00000000000007d7 Nov 27 09:26:32 sw112 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: ffff81007401ac00 Nov 27 09:26:32 sw112 kernel: R13: ffff81007401a210 R14: 0000000000000005 R15: 0000000000000000 Nov 27 09:26:32 sw112 kernel: FS: 00002b13822edef0(0000) GS:ffff81012bd6b340(0000) knlGS:0000000000000000 Nov 27 09:26:32 sw112 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Nov 27 09:26:32 sw112 kernel: CR2: 00000000005d99c0 CR3: 0000000037079000 CR4: 00000000000006e0 Nov 27 09:26:32 sw112 kernel: Process opensm (pid: 24713, threadinfo ffff8100c0d9e000, task ffff8100cd8047d0) Nov 27 09:26:32 sw112 kernel: Stack: ffff81012d706b10 ffff8100c0d9fe68 ffff81007401ac00 ffffffff8837d4b1 Nov 27 09:26:32 sw112 kernel: 0000000000000296 ffff8100c0d9fe40 ffff81007401a210 ffff81007401a200
Nov 27 09:26:32 sw112 kernel:        0000000000000005 ffffffff8827261e
Nov 27 09:26:32 sw112 kernel: Call Trace: <ffffffff8837d4b1>{:ib_umad:send_handler+38} Nov 27 09:26:32 sw112 kernel: <ffffffff8827261e>{:ib_mad:ib_unregister_mad_agent+359} Nov 27 09:26:32 sw112 kernel: <ffffffff8837d26b>{:ib_umad:ib_umad_unreg_agent+121} Nov 27 09:26:32 sw112 kernel: <ffffffff8837db37>{:ib_umad:ib_umad_ioctl+74} <ffffffff8018b6b9>{do_ioctl+33} Nov 27 09:26:32 sw112 kernel: <ffffffff8018b94b>{vfs_ioctl+584} <ffffffff801e7e6b>{__up_write+33} Nov 27 09:26:32 sw112 kernel: <ffffffff8018b9c6>{sys_ioctl+98} <ffffffff8010a7be>{system_call+126}
Nov 27 09:26:32 sw112 kernel:
Nov 27 09:26:32 sw112 kernel: Code: 48 8b 53 10 48 8b 41 08 48 89 42 08 48 89 10 48 c7 41 08 00 Nov 27 09:26:32 sw112 kernel: RIP <ffffffff8837d39f>{:ib_umad:dequeue_send+26} RSP <ffff8100c0d9fde8>



Here is the dump of /var/log/opensm.log:

Nov 27 09:26:44 546327 [D6AC7EF0] 0x03 -> OpenSM 3.1.7
Nov 27 09:26:44 546407 [D6AC7EF0] 0x80 -> OpenSM 3.1.7
Nov 27 09:26:44 547422 [D6AC7EF0] 0x02 -> osm_vendor_bind: Binding to port 0x4025 Nov 27 09:26:44 673957 [D6AC7EF0] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1 Nov 27 09:26:44 674032 [D6AC7EF0] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed Nov 27 09:26:44 674057 [D6AC7EF0] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR) Nov 27 09:26:44 674089 [D6AC7EF0] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Nov 27 09:26:44 675165 [D6AC7EF0] 0x80 -> Exiting SM


can you check this issue?

thanks
Dotan
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to