Pawel Dziekonski wrote:
Hi,

from time to time I get Catastrophic errors like below. software stack is
kernel 2.6.18-92.1.10.el5 with Lustre client. device and OFED info is also
below.

any hints?

thanks in advance, Pawel



06:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 
20)

# ibv_devices
    device                 node GUID
    ------              ----------------
    mthca0              0030487e07700000
# ibv_devinfo
hca_id: mthca0
        fw_ver:                         1.2.0
        node_guid:                      0030:487e:0770:0000
        sys_image_guid:                 0030:487e:0770:0003
        vendor_id:                      0x02c9
        vendor_part_id:                 25204
        hw_ver:                         0xA0
        board_id:                       SM_0000000003
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 1
                        port_lid:               441
                        port_lmc:               0x00





kernel: ib_mthca 0000:06:00.0: Catastrophic error detected: unknown error
kernel: ib_mthca 0000:06:00.0:   buf[00]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[01]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[02]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[03]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[04]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[05]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[06]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[07]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[08]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[09]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[0a]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[0b]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[0c]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[0d]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[0e]: ffffffff
kernel: ib_mthca 0000:06:00.0:   buf[0f]: ffffffff
kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
kernel: ib0: ib_detach_mcast failed (result = -11)
kernel: ib0: ipoib_mcast_detach failed (result = -11)
kernel: ib0: ib_detach_mcast failed (result = -11)
kernel: ib0: ipoib_mcast_detach failed (result = -11)
kernel: ib0: Failed to modify QP to ERROR state
kernel: ib0: timing out; 0 sends 128 receives not completed
kernel: ib0: Failed to modify QP to RESET state
kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
kernel: ib_mthca 0000:06:00.0: HW2SW_CQ failed (-11)
kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)
kernel: ib_mthca 0000:06:00.0: HW2SW_SRQ failed (-11)
kernel: ib_mthca 0000:06:00.0: HW2SW_MPT failed (-11)


kernel: ib_mthca 0000:01:00.0: Catastrophic error detected: internal parity 
error
kernel: ib_mthca 0000:01:00.0:   buf[00]: 05000000
kernel: ib_mthca 0000:01:00.0:   buf[01]: 00000000
kernel: ib_mthca 0000:01:00.0:   buf[02]: 00000000
kernel: ib_mthca 0000:01:00.0:   buf[03]: 00000000
kernel: ib_mthca 0000:01:00.0:   buf[04]: 00000000
kernel: ib_mthca 0000:01:00.0:   buf[05]: 00127f2c
kernel: ib_mthca 0000:01:00.0:   buf[06]: 000a0056
kernel: ib_mthca 0000:01:00.0:   buf[07]: 00000000
kernel: ib_mthca 0000:01:00.0:   buf[08]: 00000000
kernel: ib_mthca 0000:01:00.0:   buf[09]: 00000000
kernel: ib_mthca 0000:01:00.0:   buf[0a]: 00000000
kernel: ib_mthca 0000:01:00.0:   buf[0b]: 00000000
kernel: ib_mthca 0000:01:00.0:   buf[0c]: 00000000
kernel: ib_mthca 0000:01:00.0:   buf[0d]: 00000000
kernel: ib_mthca 0000:01:00.0:   buf[0e]: 00000000
kernel: ib_mthca 0000:01:00.0:   buf[0f]: 00000000
kernel: ib0: ib_query_port failed


This is a known issue with Infinihost III HCA FW 1.2.0
Please contact Mellanox support to get an updated version for the FW

Tziporet

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to