These are the MPI_COMPLEX failures that I reported to George last week.


On Jul 28, 2009, at 8:06 PM, Ralph Castain wrote:

Hi folks

I was reviewing the trunk MTT results tonight and found a ton of
failures in the Intel test suite on IU's odin cluster. That cluster -
usually- runs pretty clean, so I took a closer look.

What I found was that the errors were all typified by the following:

  MPITEST_INFO (         0): Starting test MPI_Allgather()
[odin001:31038] *** Process received signal ***
[odin001:31038] Signal: Floating point exception (8)
[odin001:31038] Signal code: Integer divide-by-zero (1)
[odin001:31038] Failing at address: 0x804c8c9
[odin001:31039] *** Process received signal ***
[odin001:31039] Signal: Floating point exception (8)
[odin001:31039] Signal code: Integer divide-by-zero (1)
[odin001:31039] Failing at address: 0x804c8c9
[odin001:31040] *** Process received signal ***
[odin001:31040] Signal: Floating point exception (8)
[odin001:31040] Signal code: Integer divide-by-zero (1)
[odin001:31040] Failing at address: 0x804c8c9
[odin001:31038] [ 0] [0xffffe600]
[odin001:31038] [ 1] src/MPI_Allgather_f(MAIN__+0x2db) [0x804b30f]
[odin001:31038] [ 2] src/MPI_Allgather_f(main+0x27) [0x805aa57]
[odin001:31038] [ 3] /lib/libc.so.6(__libc_start_main+0xdc) [0xf7c32dec]
[odin001:31038] [ 4] src/MPI_Allgather_f [0x804af81]
[odin001:31038] *** End of error message ***


In other words, a divide-by-zero floating point exception on a
collective test.

Any ideas what might be causing this?

Ralph

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
jsquy...@cisco.com

Reply via email to