Other relevant info: I never saw this problem with OpenMPI 1.6.5,1.8.4
and 1.10.[3,4] which runs the same test suite...
thanks,
Eric
On 13/09/16 11:35 AM, Eric Chamberland wrote:
Hi,
It is the third time this happened into the last 10 days.
While running nighlty tests (~2200), we have one or two tests that fails
at the very beginning with this strange error:
[lorien:142766] [[9325,5754],0] usock_peer_recv_connect_ack: received
unexpected process identifier [[9325,0],0] from [[5590,0],0]
But I can't reproduce the problem right now... ie: If I launch this test
alone "by hand", it is successful... the same test was successful
yesterday...
Is there some kind of "race condition" that can happen on the creation
of "tmp" files if many tests runs together on the same node? (we are
oversubcribing even sequential runs...)
Here are the build logs:
http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.09.13.01h16m01s_config.log
http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.09.13.01h16m01s_ompi_info_all.txt
Thanks,
Eric
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel