Hi George:
I did some more experimenting. Just copying over the btl_sm_fifo.h file
was not enough. I also had to make this change (which I found in the
trunk) to the btl_sm_component.c file. After that, my hangs went away.
burpen-csx10-0 164 =>svn diff btl_sm_component.c
Index: btl_sm_component.c
===================================================================
--- btl_sm_component.c (revision 19393)
+++ btl_sm_component.c (working copy)
@@ -389,9 +389,7 @@
opal_atomic_lock(fifo->tail_lock);
}
- hdr =
(mca_btl_sm_hdr_t*)ompi_cb_fifo_read_from_tail(&fifo->tail->cb_fifo,
-
fifo->tail->cb_overflow,
- &useless );
+ hdr = (mca_btl_sm_hdr_t*)ompi_fifo_read_from_tail(fifo);
/* release thread lock */
if(opal_using_threads()) {
burpen-csx10-0 165 =>
Rolf vandeVaart wrote:
George:
We are still seeing hangs in OMPI 1.3 which I assume are due to the
PML issue. However, we do not see it in the trunk. My investigation
is early, but I am wondering if the merge of the changes into v1.3 may
be missing a file. From the original fix in the trunk, I see the
following:
Changeset 19309 (trunk)
btl_sm.c (modified) (2 diffs)
btl_sm_component.c (modified) (7 diffs)
btl_sm_fifo.h (modified) (1 diff)
For the ompi v1.3 I see this:
Changeset 19378 (v1.3)
btl/sm/btl_sm.c (modified) (1 diff)
btl/sm/btl_sm_component.c (modified) (2 diffs)
coll/sm/coll_sm_module.c (modified) (1 diff)
pml/ob1/pml_ob1_sendreq.c (modified) (1 diff)
The 1.3 changeset has those two extra files, but they were just
formatting fixes. So, my concern is the missing btl_sm_fifo.h change
in 1.3. I have not tried it out yet, but wanted to see if anyone else
is still seeing 1.3 hangs.
Rolf
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel