I'm not following what the bug is. The code you pointed to is called from here: https://github.com/apache/mesos/blob/1.4.0/src/slave/status_update_manager.cpp#L762-L776
Where we ignore duplicates and also ensure that the ack matches the latest update we've sent. So, from the code you pointed to I don't see an issue yet. Which update does the duplicate ack for 67f548b4-96cb-4b57-8720-2c8a4ba347e8 refer to? The logs you provided don't show when the status updates arrive for these, can you include that? On Mon, Apr 16, 2018 at 2:08 PM, Varun Gupta <var...@uber.com> wrote: > We use explicit ack from Scheduler. > > Here, is a snippet of the logs. Please see logs for Status Update UUID: > a918f5ed-a604-415a-ad62-5a34fb6334ef > > W0416 00:41:25.843505 124530 status_update_manager.cpp:761] Duplicate > status update acknowledgment (UUID: 67f548b4-96cb-4b57-8720-2c8a4ba347e8) > for update TASK_RUNNING (UUID: a918f5ed-a604-415a-ad62-5a34fb6334ef) for > task node-12__aea9be06-3d26-42c1-ac00-e7b840d47699 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190 > > E0416 00:41:25.843559 124542 slave.cpp:2951] Failed to handle status update > acknowledgement (UUID: 67f548b4-96cb-4b57-8720-2c8a4ba347e8) for task > node-12__aea9be06-3d26-42c1-ac00-e7b840d47699 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190: Duplicate acknowledgement > > W0416 00:41:28.416702 124539 status_update_manager.cpp:478] Resending > status update TASK_RUNNING (UUID: f320b354-067d-4421-a650-aba7a17b219e) > for > task node-12__50d38098-f026-4720-b374-a94e7171c830 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190 > > I0416 00:41:28.416910 124530 slave.cpp:4051] Forwarding the update > TASK_RUNNING (UUID: f320b354-067d-4421-a650-aba7a17b219e) for task > node-12__50d38098-f026-4720-b374-a94e7171c830 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190 to master@10.160.41.62:5050 > > W0416 00:41:28.425237 124537 status_update_manager.cpp:478] Resending > status update TASK_RUNNING (UUID: a918f5ed-a604-415a-ad62-5a34fb6334ef) > for > task node-12__aea9be06-3d26-42c1-ac00-e7b840d47699 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190 > > I0416 00:41:28.425361 124525 slave.cpp:4051] Forwarding the update > TASK_RUNNING (UUID: a918f5ed-a604-415a-ad62-5a34fb6334ef) for task > node-12__aea9be06-3d26-42c1-ac00-e7b840d47699 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190 to master@10.160.41.62:5050 > > I0416 00:41:36.723466 124518 status_update_manager.cpp:395] Received status > update acknowledgement (UUID: b99b15d6-d9af-4395-9c46-709f91db90e1) for > task node-12__50d38098-f026-4720-b374-a94e7171c830 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190 > > W0416 00:41:36.723588 124518 status_update_manager.cpp:761] Duplicate > status update acknowledgment (UUID: b99b15d6-d9af-4395-9c46-709f91db90e1) > for update TASK_RUNNING (UUID: f320b354-067d-4421-a650-aba7a17b219e) for > task node-12__50d38098-f026-4720-b374-a94e7171c830 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190 > > E0416 00:41:36.723650 124518 slave.cpp:2951] Failed to handle status update > acknowledgement (UUID: b99b15d6-d9af-4395-9c46-709f91db90e1) for task > node-12__50d38098-f026-4720-b374-a94e7171c830 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190: Duplicate acknowledgement > > I0416 00:41:36.730357 124527 status_update_manager.cpp:395] Received status > update acknowledgement (UUID: 67f548b4-96cb-4b57-8720-2c8a4ba347e8) for > task node-12__aea9be06-3d26-42c1-ac00-e7b840d47699 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190 > > W0416 00:41:36.730417 124527 status_update_manager.cpp:761] Duplicate > status update acknowledgment (UUID: 67f548b4-96cb-4b57-8720-2c8a4ba347e8) > for update TASK_RUNNING (UUID: a918f5ed-a604-415a-ad62-5a34fb6334ef) for > task node-12__aea9be06-3d26-42c1-ac00-e7b840d47699 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190 > > E0416 00:41:36.730465 124518 slave.cpp:2951] Failed to handle status update > acknowledgement (UUID: 67f548b4-96cb-4b57-8720-2c8a4ba347e8) for task > node-12__aea9be06-3d26-42c1-ac00-e7b840d47699 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190: Duplicate acknowledgement > > I0416 00:41:43.769918 124532 http.cpp:277] HTTP GET for > /slave(1)/state.json from 10.65.5.13:42600 with > User-Agent='filebundle-agent' > > W0416 00:41:48.417913 124539 status_update_manager.cpp:478] Resending > status update TASK_RUNNING (UUID: f320b354-067d-4421-a650-aba7a17b219e) > for > task node-12__50d38098-f026-4720-b374-a94e7171c830 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190 > > I0416 00:41:48.418166 124538 slave.cpp:4051] Forwarding the update > TASK_RUNNING (UUID: f320b354-067d-4421-a650-aba7a17b219e) for task > node-12__50d38098-f026-4720-b374-a94e7171c830 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190 to master@10.160.41.62:5050 > > W0416 00:41:48.425532 124539 status_update_manager.cpp:478] Resending > status update TASK_RUNNING (UUID: a918f5ed-a604-415a-ad62-5a34fb6334ef) > for > task node-12__aea9be06-3d26-42c1-ac00-e7b840d47699 of framework > 5f606c54-8caa-4e69-afdf-7f2c79acc7e6-1190 > > > I discussed with @zhitao. Here, is a plausible explanation of the bug. > Executor sends, updates to agent every 60 seconds, and agents maintains > them in pending > <https://github.com/apache/mesos/blob/master/src/slave/task_ > status_update_manager.hpp#L173> > queue. Now when the ack comes, they can be in any order for the status > update but _handle section pops > <https://github.com/apache/mesos/blob/master/src/slave/task_ > status_update_manager.cpp#L888> > last update from the queue without making sure, ack was for that status > update. > > > > > On Tue, Apr 10, 2018 at 6:43 PM, Benjamin Mahler <bmah...@apache.org> > wrote: > > > Do you have logs? Which acknowledgements did the agent receive? Which > > TASK_RUNNING in the sequence was it re-sending? > > > > On Tue, Apr 10, 2018 at 6:41 PM, Benjamin Mahler <bmah...@apache.org> > > wrote: > > > > > > Issue is that, *old executor reference is hold by slave* (assuming it > > > did not receive acknowledgement, whereas master and scheduler have > > > processed the status updates), so it continues to retry TASK_RUNNING > > > infinitely. > > > > > > The agent only retries so long as it does not get an acknowledgement, > is > > > the scheduler acknowledging the duplicates updates or ignoring them? > > > > > > On Mon, Apr 9, 2018 at 12:10 PM, Varun Gupta <var...@uber.com> wrote: > > > > > >> Hi, > > >> > > >> We are running into an issue with slave status update manager. Below > is > > >> the > > >> behavior I am seeing. > > >> > > >> Our use case is, we run Stateful container (Cassandra process), here > > >> Executor polls JMX port at 60 second interval to get Cassandra State > and > > >> sends the state to agent -> master -> framework. > > >> > > >> *RUNNING Cassandra Process translates to TASK_RUNNING.* > > >> *CRASHED or DRAINED Cassandra Process translates to TASK_FAILED.* > > >> > > >> At some point slave has multiple TASK_RUNNING status updates in stream > > and > > >> then followed by TASK_FAILED if acknowledgements are pending. We use > > >> explicit acknowledgements, and I see Mesos Master receives, all > > >> TASK_RUNNING and then TASK_FAILED as well as Framework also receives > all > > >> TASK_RUNNING updates followed up TASK_FAILED. After receiving > > TASK_FAILED, > > >> Framework restarts different executor on same machine using old > > persistent > > >> volume. > > >> > > >> Issue is that, *old executor reference is hold by slave* (assuming it > > did > > >> not receive acknowledgement, whereas master and scheduler have > processed > > >> the status updates), so it continues to retry TASK_RUNNING > infinitely. > > >> Here, old executor process is not running. As well as new executor > > process > > >> is running, and continues to work as-is. This makes be believe, some > bug > > >> with slave status update manager. > > >> > > >> I read slave status update manager code, recover > > >> <https://github.com/apache/mesos/blob/master/src/slave/task_ > > >> status_update_manager.cpp#L203> > > >> has a constraint > > >> <https://github.com/apache/mesos/blob/master/src/slave/task_ > > >> status_update_manager.cpp#L239> > > >> to ignore status updates from stream if the last executor run is > > >> completed. > > >> > > >> I think, similar constraint should be applicable for status update > > >> <https://github.com/apache/mesos/blob/master/src/slave/task_ > > >> status_update_manager.cpp#L318> > > >> and acknowledge > > >> <https://github.com/apache/mesos/blob/master/src/slave/task_ > > >> status_update_manager.cpp#L760> > > >> . > > >> > > >> > > >> Thanks, > > >> Varun > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> On Fri, Mar 16, 2018 at 7:47 PM, Benjamin Mahler <bmah...@apache.org> > > >> wrote: > > >> > > >> > (1) Assuming you're referring to the scheduler's acknowledgement of > a > > >> > status update, the agent will not forward TS2 until TS1 has been > > >> > acknowledged. So, TS2 will not be acknowledged before TS1 is > > >> acknowledged. > > >> > FWICT, we'll ignore any violation of this ordering and log a > warning. > > >> > > > >> > (2) To reverse the question, why would it make sense to ignore them? > > >> > Assuming you're looking to reduce the number of round trips needed > for > > >> > schedulers to see the terminal update, I would point you to: > > >> > https://issues.apache.org/jira/browse/MESOS-6941 > > >> > > > >> > (3) When the agent sees an executor terminate, it will transition > all > > >> > non-terminal tasks assigned to that executor to TASK_GONE (partition > > >> aware > > >> > framework) or TASK_LOST (non partition aware framework) or > TASK_FAILED > > >> (if > > >> > the container OOMed). There may be other cases, it looks a bit > > >> convoluted > > >> > to me. > > >> > > > >> > On Thu, Mar 15, 2018 at 10:35 AM, Zhitao Li <zhitaoli...@gmail.com> > > >> wrote: > > >> > > > >> > > Hi, > > >> > > > > >> > > While designing the correct behavior with one of our framework, we > > >> > > encounters some questions about behavior of status update: > > >> > > > > >> > > The executor continuously polls the workload probe to get current > > >> mode of > > >> > > workload (a Cassandra server), and send various status update > states > > >> > > (STARTING, RUNNING, FAILED, etc). > > >> > > > > >> > > Executor polls every 30 seconds, and sends status update. Here, we > > are > > >> > > seeing congestion on task update acknowledgements somewhere (still > > >> > > unknown). > > >> > > > > >> > > There are three scenarios that we want to understand. > > >> > > > > >> > > 1. Agent queue has task update TS1, TS2 & TS3 (in this order) > > >> waiting > > >> > on > > >> > > acknowledgement. Suppose if TS2 receives an acknowledgement, > then > > >> what > > >> > > will > > >> > > happen to TS1 update in the queue. > > >> > > > > >> > > > > >> > > 1. Agent queue has task update TS1, TS2, TS3 & TASK_FAILED. > Here, > > >> TS1, > > >> > > TS2, TS3 are non-terminial updates. Once the agent has > received a > > >> > > terminal > > >> > > status update, does it makes sense to ignore non-terminal > updates > > >> in > > >> > the > > >> > > queue? > > >> > > > > >> > > > > >> > > 1. As per Executor Driver code comment > > >> > > <https://github.com/apache/mesos/blob/master/src/java/src/ > > >> > > org/apache/mesos/ExecutorDriver.java#L86>, > > >> > > if the executor is terminated, does agent send TASK_LOST? If > so, > > >> does > > >> > it > > >> > > send once or for each unacknowledged status update? > > >> > > > > >> > > > > >> > > I'll study the code in status update manager and agent separately > > but > > >> > some > > >> > > official answer will definitely help. > > >> > > > > >> > > Many thanks! > > >> > > > > >> > > -- > > >> > > Cheers, > > >> > > > > >> > > Zhitao Li > > >> > > > > >> > > > >> > > > > > > > > >