[
https://issues.apache.org/jira/browse/MESOS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998805#comment-14998805
]
Neil Conway commented on MESOS-3870:
------------------------------------
You mean "volatile"? The variable is read and written inside a "synchronized"
block, which will do the necessary synchronization (memory barriers) to ensure
that other CPUs see the appropriate values (provided they also use synchronized
blocks when examining the variable).
There are a few places that read "ProcessBase.state" without holding the mutex
(e.g., ProcessManager::resume()) -- that is probably unsafe and should be fixed.
(Note that "volatile" is not sufficient/appropriate for ensuring reasonable
semantics for concurrent access to shared state without mutual exclusion,
anyway...)
> Prevent out-of-order libprocess message delivery
> ------------------------------------------------
>
> Key: MESOS-3870
> URL: https://issues.apache.org/jira/browse/MESOS-3870
> Project: Mesos
> Issue Type: Bug
> Components: libprocess
> Reporter: Neil Conway
> Priority: Minor
> Labels: mesosphere
>
> I was under the impression that {{send()}} provided in-order, unreliable
> message delivery. So if P1 sends <M1,M2> to P2, P2 might see <>, <M1>, <M2>,
> or <M1,M2> — but not <M2,M1>.
> I suspect much of the code makes a similar assumption. However, it appears
> that this behavior is not guaranteed. slave.cpp:2217 has the following
> comment:
> {noformat}
> // TODO(jieyu): Here we assume that CheckpointResourcesMessages are
> // ordered (i.e., slave receives them in the same order master sends
> // them). This should be true in most of the cases because TCP
> // enforces in order delivery per connection. However, the ordering
> // is technically not guaranteed because master creates multiple
> // connections to the slave in some cases (e.g., persistent socket
> // to slave breaks and master uses ephemeral socket). This could
> // potentially be solved by using a version number and rejecting
> // stale messages according to the version number.
> {noformat}
> We can improve this situation by _either_: (1) fixing libprocess to guarantee
> ordered message delivery, e.g., by adding a sequence number, or (2)
> clarifying that ordered message delivery is not guaranteed, and ideally
> providing a tool to force messages to be delivered out-of-order.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)