Hi,

It is the third time this happened into the last 10 days.

While running nighlty tests (~2200), we have one or two tests that fails at the very beginning with this strange error:

[lorien:142766] [[9325,5754],0] usock_peer_recv_connect_ack: received unexpected process identifier [[9325,0],0] from [[5590,0],0]

But I can't reproduce the problem right now... ie: If I launch this test alone "by hand", it is successful... the same test was successful yesterday...

Is there some kind of "race condition" that can happen on the creation of "tmp" files if many tests runs together on the same node? (we are oversubcribing even sequential runs...)

Here are the build logs:

http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.09.13.01h16m01s_config.log
http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.09.13.01h16m01s_ompi_info_all.txt

Thanks,

Eric
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to