The following is console output immediately before a panic on a system running lustre with OFED 1.1. How can I find out what it means?
2007-02-21 12:02:42 ib_mthca 0000:07:00.0: Catastrophic error detected: internal error 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[00]: 001d79f4 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[01]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[02]: 00198538 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[03]: 00136038 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[04]: 00207730 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[05]: 001d79cc 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[06]: 0023cf24 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[07]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[08]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[09]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[0a]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[0b]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[0c]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[0d]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[0e]: 00000000 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: buf[0f]: 00000000 ...shortly before it happens, the lustre/lnet OFED driver receives a number of what I believe to be duplicate SEND completion events. It seems quite sporadic, and doesn't appear to track hardware. More info at https://bugzilla.lustre.org/show_bug.cgi?id=11381 Cheers, Eric _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
