You're right, the sentence was messed-up. My intent was to say that I
found the problem, made a fix and once this fix applied to the trunk I
was not able to reproduce the deadlock.
Based on your description of the bug I forced osu_bw to send 1024 non-
blocking sends (instead of the default 64), and I still don't get the
deadlock. I'm trilled ...
george.
On Apr 6, 2009, at 19:56 , Eugene Loh wrote:
George Bosilca wrote:
I got some free time (yeh haw) and took a look at the OB1 PML in
order to fix the issue. I think I found the problem, as I'm unable
to reproduce this error.
Sorry, this sentence has me baffled. Are you unable to reproduce
the problem before the fixes or afterwards? The first step is to
reproduce the problem, right? To do so:
A) Back out r20944. Easy way to do that is just
% setenv OMPI_MCA_mpool_sm_min_size 0
B) Check that osu_bw.c hangs when using sm and you reach rendezvous
message size.
C) Introduce your changes and make sure that osu_bw.c runs to
completion.
Can you please give it a try with 20946 and 20947 but without 20944?
osu_bw.c hangs for me. The PML fix did not seem to work.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel