[
https://issues.apache.org/jira/browse/MESOS-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551568#comment-14551568
]
Benjamin Hindman commented on MESOS-2735:
-----------------------------------------
There are definitely differences in message queue behavior, one of which is
significantly safer than the other. There are two safety concerns that I can
think of, one of which [~jieyu] has addressed here but I'll repeat to be sure I
properly understood.
(1) Someone might write a ResourceEstimator that isn't asynchronous, causing
the slave to "block" while the resource estimator estimates.
(2) The ResourceEstimator might cause a denial of service attack on the slave.
I understand the concern with (1) but I'm not too anxious about it. Why? It
should be trivial to make a wrapper module which forces people to implement the
ResourceEstimator to be asynchronous, either using `async` like you suggested
or implementing a version of ResourceEstimator which wraps an actor (libprocess
process). We'll only need to do this once and then other ResourceEstimator
implementations can leverage this stuff.
On the other hand, I don't like the behavior of push because of (2).
Fundamentally, if the slave can't keep up with the rate at which the
ResourceEstimator is pushing then we could create a denial of service issue
with the slave, i.e., it takes a long time to process non-ResourceEstimator
messages because it's queue is full of just ResourceEstimator messages. I'm
more anxious about (2) than (1) because it's harder to find bugs in (2) than
with (1) since once you fix (1) it stays fixed forever but any time you updated
the algorithm you impact the potential to cause (2).
Now, I acknowledge that implementing this as a pull versus push will make the
implementation in the ResourceEstimator slightly more complicated, but not
really. In particular, it should be trivial to always use a `Queue` to achieve
the push semantics in any ResourceEstimator implementation, while still
providing the pull semantics externally. Make sense?
Finally, one of the advantages of the pull model is that it's easier to reason
about because we don't have "anonymous" lambdas that cause execution in some
other random place in the code (i.e., you can easily see in the slave where the
future that gets returned from `ResourceEstimator::estimate()` gets handled).
In addition, the ResourceEstimator remains "functional" in the sense that it
just has to return some value (or a future) from it's functions versus invoking
some callback that causes something to get run some other place (and in fact,
may also block, so isn't it safer for the ResourceEstimator to invoke the
callback in it's own `async`?).
The invocation of the `ResourceEstimator::estimate()` followed by the `.then`
is a nice pattern that let's us compose with other things as well, which is
harder to do with the lambda style callbacks and why we've avoided it where
we've been able (in fact, I'm curious which place in the code are you imitating
here?).
> Change the interaction between the slave and the resource estimator from
> polling to pushing
> --------------------------------------------------------------------------------------------
>
> Key: MESOS-2735
> URL: https://issues.apache.org/jira/browse/MESOS-2735
> Project: Mesos
> Issue Type: Bug
> Reporter: Jie Yu
> Assignee: Jie Yu
> Labels: twitter
>
> This will make the semantics more clear. The resource estimator can control
> the speed of sending resources estimation to the slave.
> To avoid cyclic dependency, slave will register a callback with the resource
> estimator and the resource estimator will simply invoke that callback when
> there's a new estimation ready. The callback will be a defer to the slave's
> main event queue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)