did you try it with OpenMPI 1.3.1 version? There have been few changes and bug fixes (example r20591, fix in ob1 PML) .
Lenny. 2009/3/23 Timothy Hayes <haye...@tcd.ie> > Hello, > > I'm working on an OpenMPI BTL component and am having a recurring problem, > I was wondering if anyone could shed some light on it. I have a component > that's quite straight forward, it uses a pair of lightweight sockets to take > advantage of being in a virtualised environment (specifically Xen). My code > is a bit messy and has lots of inefficiencies, but the logic seems sound > enough. I've been able to execute a few simple programs successfully using > the component, and they work most of the time. > > The problem I'm having is actually happening in higher layers, specifically > in my asynchronous receive handler, when I call the callback function > (cbfunc) that was set by the PML in the BTL initialisation phase. It seems > to be getting stuck in an infinite loop at __ompi_free_list_wait(), in this > function there is a condition variable which should get set eventually but > just doesn't. I've stepped through it with GDB and I get a backtrace of > something like this: > > mca_btl_xen_endpoint_recv_handler -> mca_btl_xen_endpoint_start_recv -> > mca_pml_ob1_recv_frag_callback -> mca_pml_ob1_recv_frag_match -> > __ompi_free_list_wait -> opal_condition_wait > > and from there it just loops. Although this is happening in higher levels, > I haven't noticed something like this happening in any of the other BTL > components so chances are there's something in my code that's causing this. > I very much doubt that it's actually waiting for a list item to be returned > since this infinite loop can occur non deterministically and sometimes even > on the first receive callback. > > I'm really not too sure what else to include with this e-mail. I could send > my source code (a bit nasty right now) if it would be helpful, but I'm > hoping that someone might have noticed this problem before or something > similar. Maybe I'm making a common mistake. Any advice would be really > appreciated! > > I'm using OpenMPI 1.2.9 from the SVN tag repository. > > Kind regards > Tim Hayes > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >