We are running into an issue with slave status update manager. Below is the
behavior I am seeing.
Our use case is, we run Stateful container (Cassandra process), here
Executor polls JMX port at 60 second interval to get Cassandra State and
sends the state to agent -> master -> framework.
*RUNNING Cassandra Process translates to TASK_RUNNING.*
*CRASHED or DRAINED Cassandra Process translates to TASK_FAILED.*
At some point slave has multiple TASK_RUNNING status updates in stream and
then followed by TASK_FAILED if acknowledgements are pending. We use
explicit acknowledgements, and I see Mesos Master receives, all
TASK_RUNNING and then TASK_FAILED as well as Framework also receives all
TASK_RUNNING updates followed up TASK_FAILED. After receiving TASK_FAILED,
Framework restarts different executor on same machine using old persistent
Issue is that, *old executor reference is hold by slave* (assuming it did
not receive acknowledgement, whereas master and scheduler have processed
the status updates), so it continues to retry TASK_RUNNING infinitely.
Here, old executor process is not running. As well as new executor process
is running, and continues to work as-is. This makes be believe, some bug
with slave status update manager.
I read slave status update manager code, recover
has a constraint
to ignore status updates from stream if the last executor run is completed.
I think, similar constraint should be applicable for status update
On Fri, Mar 16, 2018 at 7:47 PM, Benjamin Mahler <bmah...@apache.org> wrote:
> (1) Assuming you're referring to the scheduler's acknowledgement of a
> status update, the agent will not forward TS2 until TS1 has been
> acknowledged. So, TS2 will not be acknowledged before TS1 is acknowledged.
> FWICT, we'll ignore any violation of this ordering and log a warning.
> (2) To reverse the question, why would it make sense to ignore them?
> Assuming you're looking to reduce the number of round trips needed for
> schedulers to see the terminal update, I would point you to:
> (3) When the agent sees an executor terminate, it will transition all
> non-terminal tasks assigned to that executor to TASK_GONE (partition aware
> framework) or TASK_LOST (non partition aware framework) or TASK_FAILED (if
> the container OOMed). There may be other cases, it looks a bit convoluted
> to me.
> On Thu, Mar 15, 2018 at 10:35 AM, Zhitao Li <zhitaoli...@gmail.com> wrote:
> > Hi,
> > While designing the correct behavior with one of our framework, we
> > encounters some questions about behavior of status update:
> > The executor continuously polls the workload probe to get current mode of
> > workload (a Cassandra server), and send various status update states
> > (STARTING, RUNNING, FAILED, etc).
> > Executor polls every 30 seconds, and sends status update. Here, we are
> > seeing congestion on task update acknowledgements somewhere (still
> > unknown).
> > There are three scenarios that we want to understand.
> > 1. Agent queue has task update TS1, TS2 & TS3 (in this order) waiting
> > acknowledgement. Suppose if TS2 receives an acknowledgement, then what
> > will
> > happen to TS1 update in the queue.
> > 1. Agent queue has task update TS1, TS2, TS3 & TASK_FAILED. Here, TS1,
> > TS2, TS3 are non-terminial updates. Once the agent has received a
> > terminal
> > status update, does it makes sense to ignore non-terminal updates in
> > queue?
> > 1. As per Executor Driver code comment
> > <https://github.com/apache/mesos/blob/master/src/java/src/
> > org/apache/mesos/ExecutorDriver.java#L86>,
> > if the executor is terminated, does agent send TASK_LOST? If so, does
> > send once or for each unacknowledged status update?
> > I'll study the code in status update manager and agent separately but
> > official answer will definitely help.
> > Many thanks!
> > --
> > Cheers,
> > Zhitao Li