[ 
https://issues.apache.org/jira/browse/MESOS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735708#comment-16735708
 ] 

Benjamin Bannier commented on MESOS-9434:
-----------------------------------------

There exists a similar issue with when a {{ShutdownFrameworkMessage}} can be 
sent by a master. If an agent is partitioned from the cluster for a long time 
and does not know that a framework completed (partitioned at time of 
completion), after a master failover and resubscription of the agent the new 
master would 1) not know that the framework completed, and even 2) learn about 
the framework from the resubscribed agent.

As we currently do not reliably handle this case as well, it seems the first 
suggestion above is more consistent (i.e., have a master acknowledge operations 
status updates of frameworks it currently knows are removed). Note that should 
we e.g., persist completed {{FrameworkID}} values in the future this solution 
would work naturally as well.

Above second suggestion of masters explicitly informing status update managers 
of framework completion does not work reliable either in cases where status 
update managers are partitioned at the time of completion and subsequent master 
failovers.

> Completed framework update streams may retry forever
> ----------------------------------------------------
>
>                 Key: MESOS-9434
>                 URL: https://issues.apache.org/jira/browse/MESOS-9434
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent, resource provider
>    Affects Versions: 1.7.0
>            Reporter: Greg Mann
>            Assignee: Benjamin Bannier
>            Priority: Major
>              Labels: mesosphere
>
> Since the agent/RP currently does not GC operation status update streams when 
> frameworks are torn down, it's possible that active update streams associated 
> with completed frameworks may remain and continue retrying forever. We should 
> add a mechanism to complete these streams when the framework becomes 
> completed.
> A couple options which have come up during discussion:
> * Have the master acknowledge updates associated with completed frameworks. 
> Note that since completed frameworks are currently only tracked by the master 
> in memory, a master failover could prevent this from working perfectly.
> * Extend the RP API to allow the GC of particular update streams, and have 
> the agent GC streams associated with a framework when it receives a 
> {{ShutdownFrameworkMessage}}. This would also require the addition of a new 
> method to the status update manager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to