Hal Rosenstock wrote:
Hi Rob,

On Tue, Nov 25, 2008 at 10:21 AM, Robert Dunkley <[EMAIL PROTECTED]> wrote:
Hi Hal,

Thanks again, I will try this in a minute. I think I have found the
moment it went bad on Machine A using Dmesg:
ib_mthca 0000:87:00.0: Catastrophic error detected: unknown error

Definitely need to reset mthca after this.

ib_mthca 0000:87:00.0:   buf[00]: ffffffff
ib_mthca 0000:87:00.0:   buf[01]: ffffffff
ib_mthca 0000:87:00.0:   buf[02]: ffffffff
ib_mthca 0000:87:00.0:   buf[03]: ffffffff
ib_mthca 0000:87:00.0:   buf[04]: ffffffff
ib_mthca 0000:87:00.0:   buf[05]: ffffffff
ib_mthca 0000:87:00.0:   buf[06]: ffffffff
ib_mthca 0000:87:00.0:   buf[07]: ffffffff
ib_mthca 0000:87:00.0:   buf[08]: ffffffff
ib_mthca 0000:87:00.0:   buf[09]: ffffffff
ib_mthca 0000:87:00.0:   buf[0a]: ffffffff
ib_mthca 0000:87:00.0:   buf[0b]: ffffffff
ib_mthca 0000:87:00.0:   buf[0c]: ffffffff
ib_mthca 0000:87:00.0:   buf[0d]: ffffffff
ib_mthca 0000:87:00.0:   buf[0e]: ffffffff
ib_mthca 0000:87:00.0:   buf[0f]: ffffffff
ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11)
ib0: ib_query_gid() failed
ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11)
ib0: ib_query_port failed
ib0: Failed to modify QP to ERROR state
ib0: timing out; 1 sends 250 receives not completed
ib0: Failed to modify QP to RESET state
ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11)
ib_mthca 0000:87:00.0: HW2SW_CQ failed (-11)
ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11)
ib_mthca 0000:87:00.0: HW2SW_CQ failed (-11)
ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11)
ib_mthca 0000:87:00.0: HW2SW_SRQ failed (-11)
ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11)
ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11)
ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11)
ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11)
ib_mthca 0000:87:00.0: HW2SW_MPT failed (-11)

Does this help to pinpoint what might have caused this?

The ffffffff in the buf showing you have some PCI bus error. The mthca driver then moved to error mode and no command will be executed. I suggest you check that the card has not moved in the system and you better reboot the system again

Tziporet

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to