We have had several tickets submitted by users since we have started adding Qlogic 7240 cards into our cluster which is mostly Mellanox (we have a couple different cards). We have looked at the codes (MPI based) and they do run fine when the Qlogic cards are excluded. Qlogic suggests using PSM or IPoIB on our cluster - both of which seem like a punt to us as PSM doesn't make sense with Mellanox and IPofIB is not a solution.

Right now, we are trying to figure out where the problem is - it is not at the application level as we have distilled down to a specific case which will cause a problem (MPI all-to-all, for example). However, some things seem clearer to us.

1. test case works when using verbs using Mellanox only
2. test case works ok when we use PSM on Qlogic only
3. test case fails when using verbs between Mellanox and Qlogic
4. test case fails when using verbs on Qlogic

Is this a verb level issue with the ipath stuff or an mpi problem? Or, is the issue someplace else? There had been some discussion of a mixed environment early this year on the OMPI list but the thread petered out.

We would be happy to share our failing test case with whomever does the interop testing - if it could shed some light on the problem we see.

The point is that we would like to know that different IB cards work together (like ethernet) so we can have a choice.

Sean Hefty wrote:
Is a mixed HCA environment cluster not ready for prime time - yet?

Are the crashes in the kernel or userspace?  Is there a specific HCA on the
nodes that crash?

Interop testing is done, but I do not know the details of the configurations and
tests that are run.
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to