Lenny,I guess you're running the latest version. If not, please update, Galen and myself corrected some bugs last week. If you're using the latest (and greatest) then ... well I imagine there is at least one bug left.
There is a quick test you can do. In the btl_sm.c in the module structure at the beginning of the file, please replace the sendi function by NULL. If this fix the problem, then at least we know that it's a sm send immediate problem.
Thanks, george. On Jun 17, 2008, at 7:54 AM, Lenny Verkhovsky wrote:
Hi, George,I have a problem running BW benchmark on 100 rank cluster after r18551.The BW is mpi_p that runs mpi_bandwidth with 100K between all pairs. #mpirun -np 100 -hostfile hostfile_w ./mpi_p_18549 -t bw -s 100000BW (100) (size min max avg) 100000 576.734030 2001.882416 1062.698408#mpirun -np 100 -hostfile hostfile_w ./mpi_p_18551 -t bw -s 100000 mpirun: killing job... ( it hangs even after 10 hours ). It doesn't happen if I run --bynode or btl openib,self only. Lenny.
smime.p7s
Description: S/MIME cryptographic signature