Jeff,
On Monday 06 July 2009 11:05:16 am Jeff Squyres wrote:
> I notice that in the new HLRS mpi_test_suite, I'm getting oodles of
Well, the test suite is not really new (it was started some time around 2003)  
Regular ompi testing is new ;-)) Thanks for that, Jeff!


> errors with the MPI_TYPE_MIX and MPI_SHORT_INT datatypes (Linux/
> x86_64).  I have to run with:
>
>    mpirun mpi_test_suite -d All,\!MPI_TYPE_MIX,\!MPI_SHORT_INT
>
> (which excludes these two types)
>
> I can't quite follow the test suite code, but MPI_TYPE_MIX is some
> kind of derived MPI datatype.
Yes. Basically MPI_TYPE_MIX (and MPI_TYPE_MIX_LB_UB) is a struct of 11 basic 
types:
MPI_Datatype mix_type[11] = {MPI_CHAR, MPI_SHORT, MPI_INT, MPI_LONG,
                       MPI_FLOAT, MPI_DOUBLE, MPI_FLOAT_INT,
                       MPI_DOUBLE_INT, MPI_LONG_INT, MPI_SHORT_INT, MPI_2INT};


Now, as it contains MPI_SHORT_INT (which contains a hole), the problem's cause 
may be similar!
This has to be investigated.


> Is something wrong with our datatype engine?  Or are these tests
> faulty?
First of all, the MPI standard requires the types such as MPI_FLOAT_INT or 
MPI_SHORT_INT to be usable in reduction operations.
Nevertheless they should be fine here.

Now, MPIch2-1.1 works fine with all the datatypes (including MPI_TYPE_MIX)
------------------------------
mpirun -np 2 ./mpi_test_suite -t 'P2P,Collective' -r FULL -x strict
P2P tests Ring (3/44), comm MPI_COMM_WORLD (1/13), type MPI_CHAR (1/29)
...
Collective tests Alltoall (47/44), comm Intracomm merged of the Halved 
Intercomm (13/13), type MPI_TYPE_MIX_LB_UB (29/29)
Number of failed tests:0
------------------------------



> I don't know if anyone has run this test suite with any regularity before,
> so I don't know which it is...
Tests with these datatypes have been run on IBM's MPI, NEC's MPI (derived from 
MPIch) and Intel MPI (well, also MPIch based) although these were tested some 
time ago.
Tests against MPIch-1 and now MPIch2 have been done very often and bugs have 
been tracked down, so I believe the core of the test suite itself is fine!

[I am not talking about correctness of individual tests themselves, e.g. -t 
one-sided will definitely show bugs in the test-suite]...

With best regards,
Rainer


PS: The test suite fills the send buffers with known values according to the 
datatype being passed to the test and afterwards checks against expected 
values.
The send and recv buffers are preset with a definable pattern (0xa5) to check 
for overwritten data in holes (see type MPI_TYPE_MIX_LB_UB).
The buffer starts with the MIN, then the MAX value of the given datatype, 
followed by (2+rank_of_comm_partner), (3+rank...) etc.

One may check the hex-values of the ALL communicated buffers using a higher 
report level (-r FULL), however, one may want to reduce the number of elements 
send using -n, e.g. -n 10.
Higher values (default is  -n 1000) however have shown problems (that have 
hinted to bugs) when switching from eager protocol... These have been fixed in 
ompi.
-- 
------------------------------------------------------------------------
Rainer Keller, PhD                  Tel: +1 (865) 241-6293
Oak Ridge National Lab          Fax: +1 (865) 241-4811
PO Box 2008 MS 6164           Email: kel...@ornl.gov
Oak Ridge, TN 37831-2008    AIM/Skype: rusraink


Reply via email to