Neil Conway created MESOS-3870:
----------------------------------
Summary: Prevent out-of-order libprocess message delivery
Key: MESOS-3870
URL: https://issues.apache.org/jira/browse/MESOS-3870
Project: Mesos
Issue Type: Bug
Components: libprocess
Reporter: Neil Conway
Priority: Minor
I was under the impression that {{send()}} provided in-order, unreliable
message delivery. So if P1 sends <M1,M2> to P2, P2 might see <>, <M1>, <M2>, or
<M1,M2> — but not <M2,M1>.
I suspect much of the code makes a similar assumption. However, it appears that
this behavior is not guaranteed. slave.cpp:2217 has the following comment:
{noformat}
// TODO(jieyu): Here we assume that CheckpointResourcesMessages are
// ordered (i.e., slave receives them in the same order master sends
// them). This should be true in most of the cases because TCP
// enforces in order delivery per connection. However, the ordering
// is technically not guaranteed because master creates multiple
// connections to the slave in some cases (e.g., persistent socket
// to slave breaks and master uses ephemeral socket). This could
// potentially be solved by using a version number and rejecting
// stale messages according to the version number.
{noformat}
We can improve this situation by _either_: (1) fixing libprocess to guarantee
ordered message delivery, e.g., by adding a sequence number, or (2) clarifying
that ordered message delivery is not guaranteed, and ideally providing a tool
to force messages to be delivered out-of-order.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)