[
https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303509#comment-15303509
]
Jay Guo commented on MESOS-5468:
--------------------------------
To reproduce:
* Start master and agent
* Run long-lived-framework
* Issue {{# iptables -A OUTPUT -p tcp -d <master-ip> --dport 5050 -j DROP}} on
framework machine to emulate network partition
* Wait till master deactivates the framework
* Remove iptables rule added above to emulate network rejoin
* See log of both long-lived-framework and master. {{netstat -tpn}} also shows
enormous {{TIME_WAIT}} sockets which is the result of re-detection
> Add logic to long-lived-framework to handle HEARTBEAT timeout
> -------------------------------------------------------------
>
> Key: MESOS-5468
> URL: https://issues.apache.org/jira/browse/MESOS-5468
> Project: Mesos
> Issue Type: Bug
> Components: framework, master
> Reporter: Jay Guo
>
> Currently long-lived-framework does not handle HEARTBEAT timeout. If master
> teardown the framework without framework being aware of it (network
> partition), the framework keeps waiting for {{Event}} until reconnected.
> *On the other hand*, should we close TCP socket on master side when teardown
> a framework? Currently the tcp socket is left alive even framework has been
> deactivated. This results in framework sending invalid {{Call}} to master and
> re-detection.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)