[
https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303587#comment-15303587
]
Jay Guo commented on MESOS-5468:
--------------------------------
See steps to reproduce in my first comment.
> Add logic in long-lived-framework to handle network partitions.
> ---------------------------------------------------------------
>
> Key: MESOS-5468
> URL: https://issues.apache.org/jira/browse/MESOS-5468
> Project: Mesos
> Issue Type: Task
> Components: framework, master
> Reporter: Jay Guo
>
> Currently long-lived-framework does not handle network partitions i.e
> explicitly trying to {{reconnect}} with the master upon not receiving
> {{HEARTBEAT}} events for a prolonged amount of time. If the master
> disconnects a framework without the framework being aware of it (one way
> partition), the framework should explicitly issue a {{reconnect}} request via
> the scheduler library after a certain period of time.
> *On the other hand*, should we close TCP socket on master side when teardown
> a framework? Currently the tcp socket is left alive even framework has been
> deactivated. This results in framework sending invalid {{Call}} to master and
> re-detection.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)