[
https://issues.apache.org/jira/browse/MESOS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735708#comment-16735708
]
Benjamin Bannier commented on MESOS-9434:
-----------------------------------------
There exists a similar issue with when a {{ShutdownFrameworkMessage}} can be
sent by a master. If an agent is partitioned from the cluster for a long time
and does not know that a framework completed (partitioned at time of
completion), after a master failover and resubscription of the agent the new
master would 1) not know that the framework completed, and even 2) learn about
the framework from the resubscribed agent.
As we currently do not reliably handle this case as well, it seems the first
suggestion above is more consistent (i.e., have a master acknowledge operations
status updates of frameworks it currently knows are removed). Note that should
we e.g., persist completed {{FrameworkID}} values in the future this solution
would work naturally as well.
Above second suggestion of masters explicitly informing status update managers
of framework completion does not work reliable either in cases where status
update managers are partitioned at the time of completion and subsequent master
failovers.
> Completed framework update streams may retry forever
> ----------------------------------------------------
>
> Key: MESOS-9434
> URL: https://issues.apache.org/jira/browse/MESOS-9434
> Project: Mesos
> Issue Type: Bug
> Components: agent, resource provider
> Affects Versions: 1.7.0
> Reporter: Greg Mann
> Assignee: Benjamin Bannier
> Priority: Major
> Labels: mesosphere
>
> Since the agent/RP currently does not GC operation status update streams when
> frameworks are torn down, it's possible that active update streams associated
> with completed frameworks may remain and continue retrying forever. We should
> add a mechanism to complete these streams when the framework becomes
> completed.
> A couple options which have come up during discussion:
> * Have the master acknowledge updates associated with completed frameworks.
> Note that since completed frameworks are currently only tracked by the master
> in memory, a master failover could prevent this from working perfectly.
> * Extend the RP API to allow the GC of particular update streams, and have
> the agent GC streams associated with a framework when it receives a
> {{ShutdownFrameworkMessage}}. This would also require the addition of a new
> method to the status update manager.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)