Sorry, I meant "mpirun -mca btl_openib_mtu 4 ..." (no equal). On Wed, 2009-08-26 at 12:01 -0700, Ralph Campbell wrote: > Is your switch configured for 4K MTU? > The default openmpi parameter for QLogic is to use a 4K MTU. > Try using a 2K MTU with: > "mpirun -mca btl_openib_mtu=4 ..." and see if that works. > > > On Wed, 2009-08-26 at 02:09 -0700, Ole Widar Saastad wrote: > > I am experiencing problems using the Infinipath cards and the OFED > > stack. (details are given below). > > > > It seems to be a problem somewhere when mpi packet size grows above 2k. > > This is what I recall the changeover from one transport mechanism to > > another ? > > > > The test is easy to run and to test, it is just a bandwidth program : > > (I got far better latency using the Pathscale stack that the OFED. Is this > > something that will be looked up in the newer releases?). > > > > Two nodes in node.txt file compute-1-0 and compute-1-1. They are connected > > to a SilverStorm switch. > > > > [ol...@login-0-2 bandwidth]$ mpirun -np 2 -machinefile ./nodes.txt > > ./bandwidth.openmpi.x -b o > > Resolution (usec): 2.145767 > > Benchmark ping-pong > > =================== > > lenght iterations elapsed time transfer rate latency > > (bytes) (count) (seconds) (Mbytes/s) (usec) > > -------------------------------------------------------------------------- > > 0 10046 0.121 0.000 6.011 > > 1 10261 0.124 0.166 6.026 > > <cut a few lines> > > 1024 7695 0.140 112.615 9.093 > > 1536 6260 0.133 144.469 10.632 > > 2048 5275 0.128 168.420 12.160 > > [0,1,0][btl_openib_component.c:1375:btl_openib_component_progress] from > > compute-1-0 to: compute-1-1 error polling HP CQ with status RETRY EXCEEDED > > ERROR status number 12 for wr_id 278309104 opcode 1 > > -------------------------------------------------------------------------- > > The InfiniBand retry count between two MPI processes has been > > exceeded. "Retry count" is defined in the InfiniBand spec 1.2 > > (section 12.7.38): > > > > The total number of times that the sender wishes the receiver to > > retry timeout, packet sequence, etc. errors before posting a > > completion error. > > > > This error typically means that there is > > somethin/site/VERSIONS/openmpi-1.2.8.gnu/bin/g awry within the > > InfiniBand fabric itself. You should note the hosts on which this > > error has occurred; it has been observed that rebooting or removing a > > particular host from the job can sometimes resolve this issue. > > > > Two MCA parameters can be used to control Open MPI's behavior with > > respect to the retry count: > > > > * btl_openib_ib_retry_count - The number of times the sender will > > attempt to retry (defaulted to 7, the maximum value). > > > > * btl_openib_ib_timeout - The local ACK timeout parameter (defaulted > > to 10). The actual timeout value used is calculated as: > > > > 4.096 microseconds * (2^btl_openib_ib_timeout) > > > > See the InfiniBand spec 1.2 (section 12.7.34) for more details. > > -------------------------------------------------------------------------- > > mpirun noticed that job rank 1 with PID 9184 on node compute-1-1 exited on > > signal 15 (Terminated). > > [ol...@login-0-2 bandwidth]$ > > > > > > Background information : > > > > > > 07:00.0 InfiniBand: QLogic, Corp. InfiniPath PE-800 (rev 02) > > Subsystem: QLogic, Corp. InfiniPath PE-800 > > Flags: bus master, fast devsel, latency 0, IRQ 66 > > Memory at fde00000 (64-bit, non-prefetchable) [size=2M] > > Capabilities: [40] Power Management version 2 > > Capabilities: [50] Message Signalled Interrupts: 64bit+ > > Queue=0/0 Enable+ > > Capabilities: [70] Express Endpoint IRQ 0 > > > > compute-1-0.local# uname -a > > Linux compute-1-0.local 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:32:05 > > EDT 2008 x86_64 x86_64 x86_64 GNU/Linux > > compute-1-0.local# > > > > > > compute-1-0.local# rpm -qa| grep ofed > > libibverbs-utils-1.1.2-1.ofed1.4.2 > > librdmacm-utils-1.0.8-1.ofed1.4.2 > > libcxgb3-1.2.2-1.ofed1.4.2 > > ofed-scripts-1.4.2-0 > > libmlx4-1.0-1.ofed1.4.2 > > libibverbs-devel-1.1.2-1.ofed1.4.2 > > ofed-docs-1.4.2-0 > > ibvexdmtools-0.0.1-1.ofed1.4.2 > > libmthca-1.0.5-1.ofed1.4.2 > > libipathverbs-1.1-1.ofed1.4.2 > > mstflint-1.4-1.ofed1.4.2 > > libibumad-1.2.3_20090314-1.ofed1.4.2 > > libnes-0.6-1.ofed1.4.2 > > libibcommon-1.1.2_20090314-1.ofed1.4.2 > > libibverbs-1.1.2-1.ofed1.4.2 > > librdmacm-1.0.8-1.ofed1.4.2 > > qlgc_vnic_daemon-0.0.1-1.ofed1.4.2 > > compute-1-0.local# > > > > OpenMPI is : > > openmpi-1.2.8 compiled for gcc. > > > > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
