On May 5, 2009, at 6:01 PM, Eugene Loh wrote:

You and Terry saw something that was occurring about 0.01% of the time
during MPI_Init during add_procs. That does not seem to be what we are
seeing here.


Right -- that's what I'm saying. It's different than the MPI_INIT errors.

But we have seen failures in 1.3.1 and 1.3.2 that look like the one
here. They occur more like 1% of the time and can occur during MPI_Init
*OR* later during a collective call.  What we're looking at here seems
to be related.  E.g., see
http://www.open-mpi.org/community/lists/devel/2009/03/5768.php


Good to see that we're agreeing.

Yes, I agree that this is not a new error, but it is worth fixing. Cisco's MTT didn't run last night because there was no new trunk tarball last night. I'll check Cisco's MTT tomorrow morning and see if there are any sm failures of this new flavor, and how frequently they're happening.

--
Jeff Squyres
Cisco Systems

Reply via email to