Sorry, I checked it without sm. pls ignore this mail.
On Thu, Jun 19, 2008 at 4:32 PM, Lenny Verkhovsky < lenny.verkhov...@gmail.com> wrote: > Hi, > I found what caused the problem in both cases. > > --- ompi/mca/btl/sm/btl_sm.c (revision 18675) > +++ ompi/mca/btl/sm/btl_sm.c (working copy) > @@ -812,7 +812,7 @@ > */ > MCA_BTL_SM_FIFO_WRITE(endpoint, endpoint->my_smp_rank, > endpoint->peer_smp_rank, frag->hdr, false, rc); > - return (rc < 0 ? rc : 1); > + return OMPI_SUCCESS; > } > I am just not sure if it's OK. > > Lenny. > On Wed, Jun 18, 2008 at 3:21 PM, Lenny Verkhovsky < > lenny.verkhov...@gmail.com> wrote: > >> Hi, >> I am not sure if it related, >> but I applied your patch ( r18667 ) to r 18656 ( one before NUMA ) >> together with disabling sendi, >> The result still the same ( hanging ). >> >> >> >> >> On Tue, Jun 17, 2008 at 2:10 PM, George Bosilca <bosi...@eecs.utk.edu> >> wrote: >> >>> Lenny, >>> >>> I guess you're running the latest version. If not, please update, Galen >>> and myself corrected some bugs last week. If you're using the latest (and >>> greatest) then ... well I imagine there is at least one bug left. >>> >>> There is a quick test you can do. In the btl_sm.c in the module structure >>> at the beginning of the file, please replace the sendi function by NULL. If >>> this fix the problem, then at least we know that it's a sm send immediate >>> problem. >>> >>> Thanks, >>> george. >>> >>> >>> On Jun 17, 2008, at 7:54 AM, Lenny Verkhovsky wrote: >>> >>> Hi, George, >>>> >>>> I have a problem running BW benchmark on 100 rank cluster after r18551. >>>> The BW is mpi_p that runs mpi_bandwidth with 100K between all pairs. >>>> >>>> >>>> #mpirun -np 100 -hostfile hostfile_w ./mpi_p_18549 -t bw -s 100000 >>>> BW (100) (size min max avg) 100000 576.734030 2001.882416 >>>> 1062.698408 >>>> #mpirun -np 100 -hostfile hostfile_w ./mpi_p_18551 -t bw -s 100000 >>>> mpirun: killing job... >>>> ( it hangs even after 10 hours ). >>>> >>>> >>>> It doesn't happen if I run --bynode or btl openib,self only. >>>> >>>> >>>> Lenny. >>>> >>> >>> >> >