Are you using ompi-server for pub/sub, or just letting it default to mpirun?
You might want to output the return value from lookup_name and publish_name to see if they match. If they are different, then you will definitely hang. On Dec 21, 2010, at 6:41 AM, Suraj Prabhakaran wrote: > Hello, > > This is basically a repost of my previous mail regarding problems with > connect/accept and disconnect (*this is not related to spawning, > parent/child*). > I *sometimes* find processes blocking indefinitely at Connect/Accept calls or > at Disconnect calls. I have an example below. > > Process A > { > MPI_Open_port(...); > MPI_Publish_name(...); > MPI_Comm_accept(... &b_comm); // -----> (1) > // Do something1 > MPI_Comm_disconnect(&b_comm); // ------> (2) > // Do something2 > > } > > Process B > { > MPI_Lookup_name(...); > MPI_Comm_connect(... &a_comm); // -----> (1) > // Do something1 > MPI_Comm_disconnect(&a_comm); // ------> (2) > // Do something2 > } > > In the above scenario, in a perfect case where A reaches (1) without any > problems, *sometimes* B blocks at its (1) indefinitely. All arguments passed > to both the functions are perfect. > Again, *sometimes* one of them block infinitely at (2) while the other goes > on to do the something2. This could only be a problem at the application > level only if the one that blocks indefinitely is always the same but it is > not so. Sometimes A blocks and B is busy doing something2 or A is busy doing > its something2 while B blocks. > > Is this a known issue? or am I the only person experiencing this and is clean > for others who frequently use connect/accept/disconnect calls? > > Thanks, > Suraj > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel