He can try adding "-mca state_base_verbose 5”, but if we are failing to catch 
sigchld, I’m not sure what debugging info is going to help resolve that 
problem. These aren’t even fast-running apps, so there was plenty of time to 
register for the signal prior to termination.

I vaguely recollect that we have occasionally seen this on Mac before and it 
had something to do with oddness in sigchld handling…


> On Jun 4, 2016, at 7:01 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> 
> Meh.  Ok.  Should George run with some verbose level to get more info?
> 
>> On Jun 4, 2016, at 6:43 AM, Ralph Castain <r...@open-mpi.org> wrote:
>> 
>> Neither of those threads have anything to do with catching the sigchld - 
>> threads 4-5 are listening for OOB and PMIx connection requests. It looks 
>> more like mpirun thought it had picked everything up and has begun shutting 
>> down, but I can’t really tell for certain.
>> 
>>> On Jun 4, 2016, at 6:29 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>>> wrote:
>>> 
>>> On Jun 3, 2016, at 11:07 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>> 
>>>> After finalize. As I said in my original email I se all the output the 
>>>> application is generating, and all processes (which are local as this 
>>>> happens on my laptop) are in zombie mode (Z+). This basically means 
>>>> whoever was supposed to get the SIGCHLD, didn't do it's job of cleaning 
>>>> them up.
>>> 
>>> Ah -- so perhaps threads 1,2,3 are red herrings: the real problem here is 
>>> that the parent didn't catch the child exits (which presumably should have 
>>> been caught in threads 4 or 5).
>>> 
>>> Ralph: is there any state from threads 4 or 5 that would be helpful to 
>>> examine to see if they somehow missed catching children exits?
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2016/06/19070.php
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2016/06/19071.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/06/19072.php

Reply via email to