Olivier,

I am having similar issues with the same firmware.
Can you give me some more details?

Did you make the changes on the driver side or  the application?
If on the driver, can you point me in the right direction to make those
changes?

Thanks,
Todd

On 4/10/07, Olivier Cozette <[EMAIL PROTECTED]> wrote:

        Hi,

I had the same error with my driver, and after some investigation, i found
that my srq depth and cq depth was too small to handle the maximum number
of
send/recv that my application can generate concurently. Normally, in that
case the qp state must become error state, but instead of that a
catastrophic
error occur.

I increased the srq/cq depth to meet the maximum send/recv that my
application
can generate concurently (without reply/synchro) and this bug no more
occur.

So, you probably just need to increase your srq/cq depth and post buffer
to
meet the maximum send/recv that your driver can do.

        Olivier

Note : I have a MT25204 rev a0 firware 1.2.0.

Le Mardi 20 Mars 2007 18:59, Eric Barton a écrit:
> The following is console output immediately before a panic on a system
> running lustre with OFED 1.1.  How can I find out what it means?
>
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0: Catastrophic error detected:
> internal error 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[00]:
> 001d79f4
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[01]: 00000000
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[02]: 00198538
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[03]: 00136038
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[04]: 00207730
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[05]: 001d79cc
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[06]: 0023cf24
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[07]: 00000000
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[08]: 00000000
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[09]: 00000000
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[0a]: 00000000
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[0b]: 00000000
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[0c]: 00000000
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[0d]: 00000000
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[0e]: 00000000
> 2007-02-21 12:02:42 ib_mthca 0000:07:00.0:   buf[0f]: 00000000
>
> ...shortly before it happens, the lustre/lnet OFED driver receives a
number
> of what I believe to be duplicate SEND completion events.  It seems
quite
> sporadic, and doesn't appear to track hardware.
>
> More info at https://bugzilla.lustre.org/show_bug.cgi?id=11381
>
>                 Cheers,
>                         Eric
>
>
> _______________________________________________
> general mailing list
> [email protected]
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to