Hi folks,

We should clarify $SUBJECT. My understanding of the current situation is:

(1) For local messages (dispatch(), send() to a local process),
ordered delivery is guaranteed.

(2) For remote messages, ordered delivery is *not* guaranteed.

(3) Despite #2, in many cases messages from one process to a remote
process will be delivered via a single TCP connection, which will then
provide ordering. Hence, in the common case the message stream will be
ordered, but not in two situations:

(a) Two processes that communicate without first calling link(). Any
two messages might be sent over different TCP connections, so we have
no ordering guarantees.

(b) It seems that in the presence of socket errors, link() is not
sufficient to guarantee ordered delivery. For example, if P1 links to
P2 and then sends M1,M2,M3. M1 is sent over socket A; there is a
socket error during the send of M2; we then socket B to send M3. M1
and M3 are then racing.

Moreover, this all seems very dependent on the vagaries of the
libprocess implementation.

Questions:

(1) Do we want to guarantee message ordering between two remote processes?

Perhaps people with more knowledge of the Mesos codebase can comment
on how often such an ordering guarantee would be useful -- or how
often we assume it is provided right now :)

(2) If yes, we likely need to implement our own sequencing, acking,
and retransmission logic.

Note that simply assigning sequence numbers to outbound messages is
not sufficient: if a message is dropped, any subsequent messages will
never be delivered without a retransmission scheme.

(3) If no, it would be nice to provide a "demonic" socket mode in
libprocess, where the socket manager tries to maximize the chance that
two remote messages are reordered [1].

Comments welcome!

Neil

[1] This is related to what the simulation WG has been looking into.

Reply via email to