Re: [OMPI devel] collective problems

Gleb Natapov Thu, 11 Oct 2007 11:26:45 -0400

On Fri, Oct 05, 2007 at 09:43:44AM +0200, Jeff Squyres wrote:
> David --
> 
> Gleb and I just actively re-looked at this problem yesterday; we  
> think it's related to https://svn.open-mpi.org/trac/ompi/ticket/ 
> 1015.  We previously thought this ticket was a different problem, but  
> our analysis yesterday shows that it could be a real problem in the  
> openib BTL or ob1 PML (kinda think it's the openib btl because it  
> doesn't seem to happen on other networks, but who knows...).
> 
> Gleb is investigating.
Here is the result of the investigation. The problem is different than
#1015 ticket. What we have here is one rank calls isend() of a small
message and wait_all() in a loop and another one calls irecv(). The
problem is that isend() usually doesn't call opal_progress() anywhere
and wait_all() doesn't call progress if all requests are already completed
so messages are never progressed. We may force opal_progress() to be called
by setting btl_openib_free_list_max to 1000. Then wait_all() will call
progress because not every request will be immediately completed by OB1. Or
we can limit a number of uncompleted requests that OB1 can allocate by setting
pml_ob1_free_list_max to 1000. Then opal_progress() will be called from a
free_list_wait() when max will be reached. The second option works much
faster for me.


> 
> 
> 
> On Oct 5, 2007, at 12:59 AM, David Daniel wrote:
> 
> > Hi Folks,
> >
> > I have been seeing some nasty behaviour in collectives,  
> > particularly bcast and reduce.  Attached is a reproducer (for bcast).
> >
> > The code will rapidly slow to a crawl (usually interpreted as a  
> > hang in real applications) and sometimes gets killed with sigbus or  
> > sigterm.
> >
> > I see this with
> >
> >   openmpi-1.2.3 or openmpi-1.2.4
> >   ofed 1.2
> >   linux 2.6.19 + patches
> >   gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2)
> >   4 socket, dual core opterons
> >
> > run as
> >
> >   mpirun --mca btl self,openib --npernode 1 --np 4 bcast-hang
> >
> > To my now uneducated eye it looks as if the root process is rushing  
> > ahead and not progressing earlier bcasts.
> >
> > Anyone else seeing similar?  Any ideas for workarounds?
> >
> > As a point of reference, mvapich2 0.9.8 works fine.
> >
> > Thanks, David
> >
> >
> > <bcast-hang.c>
> > _______________________________________________
> > devel mailing list
> > [email protected]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> Cisco Systems
> 
> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
                        Gleb.

Re: [OMPI devel] collective problems

Reply via email to