On 2 Jun 2010, at 16:49, Jeff Squyres wrote:

> On Jun 2, 2010, at 11:29 AM, Sylvain Jeaugey wrote:
> 
>> But it made me progress on why I'm crashing : in my case, only a subset of
>> processes have their create_cq fail.
> 
> Ah, this is the key.  If I have one process (out of many) fail the 
> create_cq() function, I get a segv during finalize.  I'll dig.

Is there an assumption that if process A claims to be able to communicate with 
process B that process B can also communicate with process A.  It almost sounds 
like the code needs to do a allreduce on the bitmask returned by the btls.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk


Reply via email to