They symptom is that the process hangs forever. Its difficult to differentiate this bug and simply running out of registered memory.
The bug is hit if the pml is using the mpi_leave_pinned protocol and the btl returns an error from its send function. -Nathan ________________________________________ From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] on behalf of Christopher Samuel [sam...@unimelb.edu.au] Sent: Thursday, March 01, 2012 7:58 PM To: de...@open-mpi.org Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r26077 (fwd) -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/03/12 02:56, Nathan Hjelm wrote: > Found a pretty nasty frag leak (and a minor one) in ob1 (see > commit below). If this fix addresses some hangs we are seeing on > infiniband LANL might want a 1.4.6 rolled (or a faster rollout for > 1.6.0). What symptoms would an affected job show? Does it fail with an OMPI error or does it just hang using 0% CPU? cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9QN10ACgkQO2KABBYQAh9aRgCePZXdzqlI8lpfqWtHf8rtFvup 2D8An3E9y411xTyRBpfwHLPpWTzqUiuv =3EXP -----END PGP SIGNATURE----- _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel