They symptom is that the process hangs forever. Its difficult to differentiate 
this bug and simply running out of registered memory.

The bug is hit if the pml is using the mpi_leave_pinned protocol and the btl 
returns an error from its send function.

-Nathan

________________________________________
From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] on behalf of 
Christopher Samuel [sam...@unimelb.edu.au]
Sent: Thursday, March 01, 2012 7:58 PM
To: de...@open-mpi.org
Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r26077 (fwd)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/03/12 02:56, Nathan Hjelm wrote:

> Found a pretty nasty frag leak (and a minor one) in ob1 (see
> commit below). If this fix addresses some hangs we are seeing on
> infiniband LANL might want a 1.4.6 rolled (or a faster rollout for
> 1.6.0).

What symptoms would an affected job show?  Does it fail with an OMPI
error or does it just hang using 0% CPU?

cheers,
Chris
- --
    Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
         http://www.vlsci.unimelb.edu.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9QN10ACgkQO2KABBYQAh9aRgCePZXdzqlI8lpfqWtHf8rtFvup
2D8An3E9y411xTyRBpfwHLPpWTzqUiuv
=3EXP
-----END PGP SIGNATURE-----
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to