Re: [OMPI devel] possible bug in 1.3.2 sm transport

Bryan Lally Mon, 18 May 2009 23:49:50 -0400

Eugene Loh wrote:

Ralph Castain wrote:
Hi Bryan
I have seen similar issues on LANL clusters when message sizes werefairly large. How big are your buffers when you call Allreduce? Canyou send us your Allreduce call params (e.g., the reduce operation,datatype, num elements)?
If you don't want to send that to the list, you can send it to me atLANL.
I haven't seen any updates on this. Please tell me Bryan sent info toRalph at LANL and Ralph nailed this one. Please! :^)


Eugene,

I've got mostly good news ...

Ralph sent me a platform file and a corresponding .conf file. I builtompi from openmpi-1.3.3a1r21223.tar.gz, with these files. I've beenrunning my normal tests and have been unable to hang a job yet. I'verun enough that I don't expect to see a problem.

So we're up and running, but with some extra voodoo in the platformfiles. This is on a totally vanilla Fedora 9 installation (other than acouple of Fortran compilers, but we're not using the Fortran interfaceto mpi), running on a Dell workstation with 2 quad-core CPUs - vanillahardware, too. MPI isn't going out of the box.

From a user's perspective, configure should be setting the rightdefaults on such a setup. But the core code seems to be working - I'mgiving it a good hammering.

The allreduces in question were doing a logical or on 1 integer fromeach process - it was an error check. Hence the buffers (on theapplication side) were 4 bytes. There were only 4 processes involved.


        - Bryan

--
Bryan Lally, [email protected]
505.667.9954
CCS-2
Los Alamos National Laboratory
Los Alamos, New Mexico

Re: [OMPI devel] possible bug in 1.3.2 sm transport

Reply via email to