Ralph,

here is the full description of a race condition in oob/tcp i very briefly
mentionned in a previous post :

the race condition can occur when two not connected orted try to send a
message to each other for the first time and at the same time.

that can occur when running mpi helloworld on 4 nodes with the grpcomm/rcd
module.

here is a scenario in which the race condition occurs :

orted vpid 2 and 3 enter the allgather
/* they are not orte yet oob/tcp connected*/
and they call orte.send_buffer_nb each other.
from a libevent point of view, vpid 2 and 3 will call
mca_oob_tcp_peer_try_connect

vpid 2 calls mca_oob_tcp_send_handler

vpid 3 calls connection_event_handler

depending on the value returned by random() in libevent, vpid 3 will
either call mca_oob_tcp_send_handler (likely) or recv_handler (unlikely)
if vpid 3 calls recv_handler, it will close the two sockets to vpid 2

then vpid 2 will call mca_oob_tcp_recv_handler
(peer->state is MCA_OOB_TCP_CONNECT_ACK)
that will invoke mca_oob_tcp_recv_connect_ack
tcp_peer_recv_blocking will fail
/* zero bytes are recv'ed since vpid 3 previously closed the socket before
writing a header */
and this is handled by mca_oob_tcp_recv_handler as a fatal error
/* ORTE_FORCED_TERMINATE(1) */

could you please have a look at it ?

if you are too busy, could you please advise where this scenario should be
handled differently ?
- should vpid 3 keep one socket instead of closing both and retrying ?
- should vpid 2 handle the failure as a non fatal error ?

Cheers,

Gilles

Reply via email to