Looks like https://bugs.openfabrics.org/show_bug.cgi?id=431 to me, which is fixed in OFED-1.2-20070411-0938 or newer.
Scott Weitzenkamp SQA and Release Manager Server Virtualization Business Unit Cisco Systems > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Pradeep Satyanarayana > Sent: Monday, April 16, 2007 5:19 PM > To: [email protected]; Michael S. Tsirkin; Roland > Dreier (rdreier) > Subject: [ofa-general] Next set of mthca issues > > Here is the stack trace that I see after I upgraded to the > latest version > (3.5) of the FW. Now the version > of FW is not displayed in /var/log/messages. Is that because > FW version is > at the "expected level"? > However, /sys/class/infiniband/mthca0/fw_ver does indicate it is 3.5. > > ping seems to work fine, but run into problems with netperf. > (especially > when it is the receiver i.e. running netserver). > I am running these tests on a ppc64 mcahine. > > Pradeep > [EMAIL PROTECTED] > > > Apr 16 19:37:49 elm3b37 kernel: ib_mthca: Mellanox InfiniBand > HCA driver > v0.08 (February 14, 2006) > Apr 16 19:37:49 elm3b37 kernel: ib_mthca: Initializing 0002:d9:00.0 > Apr 16 19:37:53 elm3b37 kernel: ADDRCONF(NETDEV_UP): ib1: link is not > ready > Apr 16 19:38:02 elm3b37 kernel: ib0: enabling connected mode > will cause > multicast packet drops > Apr 16 19:38:05 elm3b37 kernel: ib0: mtu > 2044 will cause multicast > packet drops. > Apr 16 19:46:25 elm3b37 kernel: Call Trace: > Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3BB0] [C00000000000F884] > .show_stack+0x54/0x1f0 (unreliable) > Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3C60] [C0000000000415EC] > .eeh_dn_check_failure+0x2bc/0x320 > Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3D10] [C0000000000416E4] > .eeh_check_failure+0x94/0x170 > Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3D90] [D00000000025ACEC] > .mthca_tavor_interrupt+0x1cc/0x1e0 [ib_mthca] > Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3E50] [C00000000008C180] > .handle_IRQ_event+0x70/0x100 > Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3EF0] [C00000000008EAB0] > .handle_fasteoi_irq+0xd0/0x200 > Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3F90] [C000000000028638] > .call_handle_irq+0x1c/0x2c > Apr 16 19:46:25 elm3b37 kernel: [C0000000EB57FA50] [C00000000000CCA0] > .do_IRQ+0xc0/0x1e0 > Apr 16 19:46:25 elm3b37 kernel: [C0000000EB57FAE0] [C000000000004270] > hardware_interrupt_entry+0x18/0x28 > Apr 16 19:46:25 elm3b37 kernel: --- Exception: 501 at > .pseries_dedicated_idle_sleep+0xd4/0x1a0 > Apr 16 19:46:25 elm3b37 kernel: LR = > .pseries_dedicated_idle_sleep+0xd0/0x1a0 > Apr 16 19:46:25 elm3b37 kernel: [C0000000EB57FDD0] [0000000000000000] > .__start+0x4000000000000000/0x8 (unreliable) > Apr 16 19:46:25 elm3b37 kernel: [C0000000EB57FE70] [C00000000001200C] > .cpu_idle+0x13c/0x250 > Apr 16 19:46:25 elm3b37 kernel: [C0000000EB57FF00] [C00000000002B16C] > .start_secondary+0x14c/0x190 > Apr 16 19:46:25 elm3b37 kernel: [C0000000EB57FF90] [C000000000008364] > .start_secondary_prolog+0xc/0x10 > Apr 16 19:46:25 elm3b37 kernel: EEH: Detected PCI bus error on device > 0002:d9:00.0 > Apr 16 19:46:25 elm3b37 kernel: EEH: This PCI device has > failed 1 times > since last reboot: location=U7879.001.DQD1EKZ-P1-C2 > driver=ib_mthca pci > addr=0002:d9:00.0 > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > Catastrophic error > detected: unknown error > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[00]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[01]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[02]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[03]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[04]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[05]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[06]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[07]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[08]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[09]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[0a]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[0b]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[0c]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[0d]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[0e]: ffffffff > Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: > buf[0f]: ffffffff > Apr 16 19:46:35 elm3b37 kernel: ib_mthca 0002:d9:00.0: > HW2SW_MPT failed > (-11) > Apr 16 19:47:05 elm3b37 last message repeated 3 times > Apr 16 19:47:05 elm3b37 last message repeated 3 times > Apr 16 19:47:15 elm3b37 kernel: ib0: ib_detach_mcast failed > (result = -11) > Apr 16 19:47:15 elm3b37 kernel: ib0: ipoib_mcast_detach > failed (result = > -11) > Apr 16 19:47:25 elm3b37 kernel: ib0: ib_detach_mcast failed > (result = -11) > Apr 16 19:47:25 elm3b37 kernel: ib0: ipoib_mcast_detach > failed (result = > -11) > Apr 16 19:47:35 elm3b37 kernel: ib0: ib_detach_mcast failed > (result = -11) > Apr 16 19:47:35 elm3b37 kernel: ib0: ipoib_mcast_detach > failed (result = > -11) > Apr 16 19:47:45 elm3b37 kernel: ib0: ib_detach_mcast failed > (result = -11) > Apr 16 19:47:45 elm3b37 kernel: ib0: ipoib_mcast_detach > failed (result = > -11) > > > > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
