At first glance, the test looks ok.

Why do you think <= is incorrect? Is there a buffer length problem somewhere?

I am able to reproduce the problem with 10 procs, though. But it runs successfully at 8. Same results with both openib btl and tcp btl.

Can you file a ticket / dig a little deeper to see what's going wrong?


On Sep 16, 2008, at 1:00 PM, Lenny Verkhovsky wrote:

I am running mtt test on our cluster and I found error for IBM reduce_scatter_in_place test for np>8

/home/USERS/lenny/OMPI_1_3_TRUNK/bin/mpirun -np 10 -H witch2 ./ reduce_scatter_in_place

**WARNING**]: MPI_COMM_WORLD rank 4, file reduce_scatter_in_place.c: 80:
bad answer (0) at index 0 of 1000 (should be 40000)
[**WARNING**]: MPI_COMM_WORLD rank 3, file reduce_scatter_in_place.c: 80: [**WARNING**]: MPI_COMM_WORLD rank 2, file reduce_scatter_in_place.c: 80:
bad answer (20916) at index 0 of 1000 (should be 20000)
bad answer (0) at index 0 of 1000 (should be 30000)
[**WARNING**]: MPI_COMM_WORLD rank 5, file reduce_scatter_in_place.c: 80:
bad answer (0) at index 0 of 1000 (should be 50000)
[**WARNING**]: MPI_COMM_WORLD rank 6, file reduce_scatter_in_place.c: 80:
bad answer (0) at index 0 of 1000 (should be 60000)
[**WARNING**]: MPI_COMM_WORLD rank 7, file reduce_scatter_in_place.c: 80: [**WARNING**]: MPI_COMM_WORLD rank 8, file reduce_scatter_in_place.c: 80:
bad answer (0) at index 0 of 1000 (should be 80000)
bad answer (0) at index 0 of 1000 (should be 70000)
[**WARNING**]: MPI_COMM_WORLD rank 9, file reduce_scatter_in_place.c: 80:
bad answer (0) at index 0 of 1000 (should be 90000)
[**WARNING**]: MPI_COMM_WORLD rank 0, file reduce_scatter_in_place.c: 80:
bad answer (-516024720) at index 0 of 1000 (should be 0)
[**WARNING**]: MPI_COMM_WORLD rank 1, file reduce_scatter_in_place.c: 80:
bad answer (28112) at index 0 of 1000 (should be 10000)

I think that the error is in the test itself.

--- sources/test_get__ibm/ibm/collective/reduce_scatter_in_place.c 2005-09-28 18:11:37.000000000 +0300 +++ installs/LKcC/tests/ibm/ibm/collective/reduce_scatter_in_place.c 2008-09-16 19:32:48.000000000 +0300
@@ -64,7 +64,7 @@ int main(int argc, char **argv)
ompitest_error(__FILE__, __LINE__, "Doh! Rank %d was not able to allocate enough memory. MPI test aborted!\n", myself);
  }

- for (j = 1; j <= MAXLEN; j *= 10) {
+ for (j = 1; j < MAXLEN; j *= 10) {
  for (i = 0; i < tasks; i++) {
  recvcounts[i] = j;
  }


I am not sure if this is right fix and who can review/commit it to the test trunk.


Best regards

Lenny.


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to