[ https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309244#comment-15309244 ]
Jay Guo commented on MESOS-5468: -------------------------------- [~anandmazumdar] Sorry for the delay. One out of two connections between framework and master is successfully closed, however another one is left ESTABLISHED when master attempts to remove the framework. Upon network rejoin, master repeatedly denied subscription call from framework. So the question is, is the EVENT connection left open intentionally or accidentally? Here's the full log: {code:title=master.log} I0601 12:12:03.671700 2252 master.cpp:5195] Status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 from agent edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu) I0601 12:12:03.671931 2252 master.cpp:5243] Forwarding status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:12:03.672360 2252 master.cpp:6853] Updating the state of task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (latest state: TASK_FINISHED, status update state: TASK_FINISHED) I0601 12:14:43.677433 2247 master.cpp:5195] Status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 from agent edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu) I0601 12:14:43.677781 2247 master.cpp:5243] Forwarding status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:14:43.678387 2247 master.cpp:6853] Updating the state of task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (latest state: TASK_FINISHED, status update state: TASK_FINISHED) I0601 12:20:03.679064 2251 master.cpp:5195] Status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 from agent edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu) I0601 12:20:03.679194 2251 master.cpp:5243] Forwarding status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:20:03.679565 2251 master.cpp:6853] Updating the state of task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (latest state: TASK_FINISHED, status update state: TASK_FINISHED) E0601 12:25:02.891707 2254 process.cpp:2040] Failed to shutdown socket with fd 13: Transport endpoint is not connected I0601 12:25:02.895753 2248 master.cpp:1388] Framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (Long Lived Framework (C++)) disconnected I0601 12:25:02.896077 2248 master.cpp:2822] Disconnecting framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (Long Lived Framework (C++)) I0601 12:25:02.896289 2248 master.cpp:2846] Deactivating framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (Long Lived Framework (C++)) W0601 12:25:02.896682 2248 master.hpp:1903] Master attempted to send message to disconnected framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (Long Lived Framework (C++)) W0601 12:25:02.897027 2248 master.hpp:1909] Unable to send event to framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (Long Lived Framework (C++)): connection closed I0601 12:25:02.897341 2248 master.cpp:1401] Giving framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (Long Lived Framework (C++)) 0ns to failover I0601 12:25:02.896751 2249 hierarchical.cpp:375] Deactivated framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:25:02.901005 2251 master.cpp:5608] Framework failover timeout, removing framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (Long Lived Framework (C++)) I0601 12:25:02.901053 2251 master.cpp:6338] Removing framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (Long Lived Framework (C++)) I0601 12:25:02.901409 2251 master.cpp:6853] Updating the state of task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (latest state: TASK_FINISHED, status update state: TASK_KILLED) I0601 12:25:02.901449 2251 master.cpp:6919] Removing task 3 with resources cpus(*):0.001; mem(*):1 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 on agent edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu) I0601 12:25:02.901721 2251 master.cpp:6948] Removing executor 'default' with resources cpus(*):0.1; mem(*):32 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 on agent edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu) I0601 12:25:02.902426 2251 hierarchical.cpp:326] Removed framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 W0601 12:25:08.007905 2253 master.cpp:5291] Ignoring unknown exited executor 'default' of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 on agent edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu) I0601 12:27:03.197347 2249 http.cpp:483] HTTP POST for /master/api/v1/scheduler from 192.168.56.102:44256 I0601 12:27:03.197860 2249 master.cpp:2243] Received subscription request for HTTP framework 'Long Lived Framework (C++)' I0601 12:27:03.198251 2249 master.cpp:2282] Refusing subscription of framework 'Long Lived Framework (C++)': Framework has been removed I0601 12:27:03.207706 2247 http.cpp:483] HTTP POST for /master/api/v1/scheduler from 192.168.56.102:44258 I0601 12:27:03.208819 2247 master.cpp:2243] Received subscription request for HTTP framework 'Long Lived Framework (C++)' I0601 12:27:03.208865 2247 master.cpp:2282] Refusing subscription of framework 'Long Lived Framework (C++)': Framework has been removed I0601 12:27:03.216269 2248 http.cpp:483] HTTP POST for /master/api/v1/scheduler from 192.168.56.102:44260 I0601 12:27:03.217339 2248 master.cpp:2243] Received subscription request for HTTP framework 'Long Lived Framework (C++)' I0601 12:27:03.217389 2248 master.cpp:2282] Refusing subscription of framework 'Long Lived Framework (C++)': Framework has been removed I0601 12:27:03.224937 2252 http.cpp:483] HTTP POST for /master/api/v1/scheduler from 192.168.56.102:44262 I0601 12:27:03.225971 2252 master.cpp:2243] Received subscription request for HTTP framework 'Long Lived Framework (C++)' I0601 12:27:03.226092 2252 master.cpp:2282] Refusing subscription of framework 'Long Lived Framework (C++)': Framework has been removed {code} {code:title=agent.log} I0601 12:09:25.652889 2277 slave.cpp:1541] Got assigned task 3 for framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:09:25.653333 2277 slave.cpp:1660] Launching task 3 for framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:09:25.653484 2277 slave.cpp:1899] Queuing task '3' for executor 'default' of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (via HTTP) I0601 12:09:25.654451 2277 slave.cpp:2051] Sending queued task '3' to executor 'default' of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (via HTTP) I0601 12:09:25.658676 2273 http.cpp:192] HTTP POST for /slave(1)/api/v1/executor from 127.0.0.1:59021 I0601 12:09:25.659063 2273 slave.cpp:3257] Handling status update TASK_RUNNING (UUID: b41c5a20-b8fd-462b-96eb-d1f1d8eea7a7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:09:25.660598 2276 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: b41c5a20-b8fd-462b-96eb-d1f1d8eea7a7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:09:25.661355 2273 slave.cpp:3655] Forwarding the update TASK_RUNNING (UUID: b41c5a20-b8fd-462b-96eb-d1f1d8eea7a7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 to master@192.168.56.101:5050 I0601 12:09:25.669483 2274 status_update_manager.cpp:392] Received status update acknowledgement (UUID: b41c5a20-b8fd-462b-96eb-d1f1d8eea7a7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:09:33.661249 2274 http.cpp:192] HTTP POST for /slave(1)/api/v1/executor from 127.0.0.1:59021 I0601 12:09:33.661631 2274 slave.cpp:3257] Handling status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:09:33.663990 2277 status_update_manager.cpp:320] Received status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:09:33.664456 2274 slave.cpp:3655] Forwarding the update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 to master@192.168.56.101:5050 W0601 12:09:43.665722 2272 status_update_manager.cpp:475] Resending status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:09:43.666304 2272 slave.cpp:3655] Forwarding the update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 to master@192.168.56.101:5050 W0601 12:10:03.667471 2278 status_update_manager.cpp:475] Resending status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:10:03.667896 2278 slave.cpp:3655] Forwarding the update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 to master@192.168.56.101:5050 I0601 12:10:15.903790 2278 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948704967727812days W0601 12:10:43.668305 2275 status_update_manager.cpp:475] Resending status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:10:43.668552 2275 slave.cpp:3655] Forwarding the update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 to master@192.168.56.101:5050 I0601 12:11:15.905623 2272 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948704967727812days W0601 12:12:03.669714 2276 status_update_manager.cpp:475] Resending status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:12:03.670492 2276 slave.cpp:3655] Forwarding the update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 to master@192.168.56.101:5050 I0601 12:12:15.908663 2275 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948704967727812days I0601 12:13:15.909787 2279 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948704967727812days I0601 12:14:15.910768 2276 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days W0601 12:14:43.675572 2272 status_update_manager.cpp:475] Resending status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:14:43.676182 2272 slave.cpp:3655] Forwarding the update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 to master@192.168.56.101:5050 I0601 12:15:15.913182 2274 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:16:15.913941 2278 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:17:15.915241 2277 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:18:15.916179 2275 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:19:15.921547 2272 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days W0601 12:20:03.677420 2276 status_update_manager.cpp:475] Resending status update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:20:03.677929 2276 slave.cpp:3655] Forwarding the update TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 to master@192.168.56.101:5050 I0601 12:20:15.925854 2278 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:21:15.926455 2277 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:22:15.928486 2278 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:23:15.929718 2277 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:24:15.934732 2275 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:25:02.903633 2279 slave.cpp:2264] Asked to shut down framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 by master@192.168.56.101:5050 I0601 12:25:02.903692 2279 slave.cpp:2289] Shutting down framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:25:02.903791 2279 slave.cpp:4452] Shutting down executor 'default' of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (via HTTP) I0601 12:25:07.905294 2278 slave.cpp:4525] Killing executor 'default' of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (via HTTP) I0601 12:25:07.905537 2278 containerizer.cpp:1548] Destroying container '8f3a48f6-6fe9-4e35-86ae-05d25f16cc33' I0601 12:25:08.002238 2277 containerizer.cpp:1784] Executor for container '8f3a48f6-6fe9-4e35-86ae-05d25f16cc33' has exited I0601 12:25:08.004763 2277 provisioner.cpp:406] Ignoring destroy request for unknown container 8f3a48f6-6fe9-4e35-86ae-05d25f16cc33 I0601 12:25:08.005345 2272 slave.cpp:4134] Executor 'default' of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 terminated with signal Killed I0601 12:25:08.005656 2272 slave.cpp:4238] Cleaning up executor 'default' of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 (via HTTP) I0601 12:25:08.006259 2277 gc.cpp:55] Scheduling '/tmp/mesos-agent/slaves/edbc3730-e55b-4390-a1f2-5de5a66497f5-S0/frameworks/e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000/executors/default/runs/8f3a48f6-6fe9-4e35-86ae-05d25f16cc33' for gc 6.99999993003259days in the future I0601 12:25:08.006444 2272 slave.cpp:4326] Cleaning up framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:25:08.006567 2277 gc.cpp:55] Scheduling '/tmp/mesos-agent/slaves/edbc3730-e55b-4390-a1f2-5de5a66497f5-S0/frameworks/e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000/executors/default' for gc 6.99999992783704days in the future I0601 12:25:08.007171 2279 status_update_manager.cpp:282] Closing status update streams for framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000 I0601 12:25:08.007453 2272 gc.cpp:55] Scheduling '/tmp/mesos-agent/slaves/edbc3730-e55b-4390-a1f2-5de5a66497f5-S0/frameworks/e8288e1d-2c05-4e05-9db7-713a366f7f5f-0000' for gc 6.9999999162963days in the future I0601 12:25:15.936108 2275 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:26:15.941059 2272 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:27:15.942551 2276 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:28:15.943686 2274 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:29:15.946293 2276 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:30:15.947840 2276 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days I0601 12:31:15.951498 2276 slave.cpp:4636] Current disk usage 33.59%. Max allowed age: 3.948703230395567days {code} {code:title=long-lived-framework.log} 0601 12:09:20.368397 1842 scheduler.cpp:187] Version: 0.29.0 I0601 12:09:20.373440 1861 scheduler.cpp:471] New master detected at master@192.168.56.101:5050 I0601 12:09:20.387084 1858 long_lived_framework.cpp:176] Received SUBSCRIBED event I0601 12:09:20.392269 1858 long_lived_framework.cpp:180] Subscribed with ID ' I0601 12:09:20.392304 1858 long_lived_framework.cpp:176] Received HEARTBEAT event I0601 12:09:20.392355 1858 long_lived_framework.cpp:176] Received OFFERS event I0601 12:09:20.392453 1858 long_lived_framework.cpp:251] Starting executor and task 0 on ubuntu I0601 12:09:20.513810 1857 long_lived_framework.cpp:176] Received UPDATE event I0601 12:09:20.514286 1857 long_lived_framework.cpp:285] Task 0 is in state TASK_RUNNING I0601 12:09:20.521380 1860 long_lived_framework.cpp:176] Received UPDATE event I0601 12:09:20.522023 1860 long_lived_framework.cpp:285] Task 0 is in state TASK_FINISHED I0601 12:09:20.632443 1859 long_lived_framework.cpp:176] Received OFFERS event I0601 12:09:20.632786 1859 long_lived_framework.cpp:266] Starting task 1 on ubuntu I0601 12:09:20.653120 1857 long_lived_framework.cpp:176] Received UPDATE event I0601 12:09:20.653424 1857 long_lived_framework.cpp:285] Task 1 is in state TASK_RUNNING I0601 12:09:20.660778 1863 long_lived_framework.cpp:176] Received UPDATE event I0601 12:09:20.661046 1863 long_lived_framework.cpp:285] Task 1 is in state TASK_FINISHED I0601 12:09:21.635004 1862 long_lived_framework.cpp:176] Received OFFERS event I0601 12:09:21.635601 1862 long_lived_framework.cpp:266] Starting task 2 on ubuntu I0601 12:09:21.652076 1863 long_lived_framework.cpp:176] Received UPDATE event I0601 12:09:21.652343 1863 long_lived_framework.cpp:285] Task 2 is in state TASK_RUNNING I0601 12:09:24.656648 1856 long_lived_framework.cpp:176] Received UPDATE event I0601 12:09:24.656970 1856 long_lived_framework.cpp:285] Task 2 is in state TASK_FINISHED I0601 12:09:25.642346 1863 long_lived_framework.cpp:176] Received OFFERS event I0601 12:09:25.642716 1863 long_lived_framework.cpp:266] Starting task 3 on ubuntu I0601 12:09:25.662624 1859 long_lived_framework.cpp:176] Received UPDATE event I0601 12:09:25.662884 1859 long_lived_framework.cpp:285] Task 3 is in state TASK_RUNNING I0601 12:09:31.655921 1861 long_lived_framework.cpp:176] Received OFFERS event I0601 12:09:31.656085 1861 long_lived_framework.cpp:266] Starting task 4 on ubuntu E0601 12:27:03.184195 1862 scheduler.cpp:525] Request for call type ACCEPT failed: Connection timed out I0601 12:27:03.185212 1862 scheduler.cpp:442] Re-detecting master I0601 12:27:03.188406 1863 long_lived_framework.cpp:144] Disconnected! I0601 12:27:03.188416 1862 scheduler.cpp:471] New master detected at master@192.168.56.101:5050 E0601 12:27:03.197764 1859 scheduler.cpp:635] End-Of-File received from master. The master closed the event stream I0601 12:27:03.197942 1862 long_lived_framework.cpp:176] Received ERROR event E0601 12:27:03.198303 1862 long_lived_framework.cpp:216] Error: Framework has been removed I0601 12:27:03.198778 1860 scheduler.cpp:442] Re-detecting master I0601 12:27:03.199054 1859 scheduler.cpp:471] New master detected at master@192.168.56.101:5050 I0601 12:27:03.199069 1860 long_lived_framework.cpp:144] Disconnected! E0601 12:27:03.207345 1856 scheduler.cpp:635] End-Of-File received from master. The master closed the event stream I0601 12:27:03.207453 1856 long_lived_framework.cpp:176] Received ERROR event E0601 12:27:03.207476 1856 long_lived_framework.cpp:216] Error: Framework has been removed I0601 12:27:03.208014 1856 scheduler.cpp:442] Re-detecting master I0601 12:27:03.208768 1856 scheduler.cpp:471] New master detected at master@192.168.56.101:5050 I0601 12:27:03.208793 1858 long_lived_framework.cpp:144] Disconnected! E0601 12:27:03.215840 1861 scheduler.cpp:635] End-Of-File received from master. The master closed the event stream I0601 12:27:03.215837 1860 long_lived_framework.cpp:176] Received ERROR event E0601 12:27:03.215989 1860 long_lived_framework.cpp:216] Error: Framework has been removed I0601 12:27:03.216281 1857 scheduler.cpp:442] Re-detecting master I0601 12:27:03.216732 1860 long_lived_framework.cpp:144] Disconnected! I0601 12:27:03.216815 1863 scheduler.cpp:471] New master detected at master@192.168.56.101:5050 E0601 12:27:03.224658 1859 scheduler.cpp:635] End-Of-File received from master. The master closed the event stream I0601 12:27:03.224709 1858 long_lived_framework.cpp:176] Received ERROR event E0601 12:27:03.224736 1858 long_lived_framework.cpp:216] Error: Framework has been removed I0601 12:27:03.225445 1861 scheduler.cpp:442] Re-detecting master I0601 12:27:03.225950 1857 long_lived_framework.cpp:144] Disconnected! I0601 12:27:03.225960 1859 scheduler.cpp:471] New master detected at master@192.168.56.101:5050 E0601 12:27:03.232986 1858 scheduler.cpp:635] End-Of-File received from master. The master closed the event stream I0601 12:27:03.233135 1858 long_lived_framework.cpp:176] Received ERROR event E0601 12:27:03.233160 1858 long_lived_framework.cpp:216] Error: Framework has been removed I0601 12:27:03.233439 1858 scheduler.cpp:442] Re-detecting master I0601 12:27:03.233844 1858 long_lived_framework.cpp:144] Disconnected! I0601 12:27:03.233839 1857 scheduler.cpp:471] New master detected at master@192.168.56.101:5050 E0601 12:27:03.241492 1858 scheduler.cpp:635] End-Of-File received from master. The master closed the event stream I0601 12:27:03.241719 1861 long_lived_framework.cpp:176] Received ERROR event E0601 12:27:03.241765 1861 long_lived_framework.cpp:216] Error: Framework has been removed I0601 12:27:03.242069 1858 scheduler.cpp:442] Re-detecting master I0601 12:27:03.242430 1858 scheduler.cpp:471] New master detected at master@192.168.56.101:5050 I0601 12:27:03.242446 1861 long_lived_framework.cpp:144] Disconnected! E0601 12:27:03.250104 1859 scheduler.cpp:635] End-Of-File received from master. The master closed the event stream I0601 12:27:03.250227 1861 long_lived_framework.cpp:176] Received ERROR event E0601 12:27:03.250329 1861 long_lived_framework.cpp:216] Error: Framework has been removed I0601 12:27:03.250744 1858 scheduler.cpp:442] Re-detecting master I0601 12:27:03.251322 1863 scheduler.cpp:471] New master detected at master@192.168.56.101:5050 I0601 12:27:03.251687 1860 long_lived_framework.cpp:144] Disconnected! E0601 12:27:03.260287 1863 scheduler.cpp:635] End-Of-File received from master. The master closed the event stream I0601 12:27:03.260484 1856 long_lived_framework.cpp:176] Received ERROR event E0601 12:27:03.260998 1856 long_lived_framework.cpp:216] Error: Framework has been removed I0601 12:27:03.260911 1858 scheduler.cpp:442] Re-detecting master I0601 12:27:03.261420 1858 scheduler.cpp:471] New master detected at master@192.168.56.101:5050 I0601 12:27:03.261461 1862 long_lived_framework.cpp:144] Disconnected! E0601 12:27:03.268121 1860 scheduler.cpp:635] End-Of-File received from master. The master closed the event stream I0601 12:27:03.268169 1863 long_lived_framework.cpp:176] Received ERROR event E0601 12:27:03.268501 1863 long_lived_framework.cpp:216] Error: Framework has been removed I0601 12:27:03.268941 1863 scheduler.cpp:442] Re-detecting master I0601 12:27:03.269614 1863 long_lived_framework.cpp:144] Disconnected! I0601 12:27:03.269891 1856 scheduler.cpp:471] New master detected at master@192.168.56.101:5050 E0601 12:27:03.277648 1860 scheduler.cpp:635] End-Of-File received from master. The master closed the event stream {code} > Add logic in long-lived-framework to handle network partitions. > --------------------------------------------------------------- > > Key: MESOS-5468 > URL: https://issues.apache.org/jira/browse/MESOS-5468 > Project: Mesos > Issue Type: Task > Components: framework, master > Reporter: Jay Guo > > Currently long-lived-framework does not handle network partitions i.e > explicitly trying to {{reconnect}} with the master upon not receiving > {{HEARTBEAT}} events for a prolonged amount of time. If the master > disconnects a framework without the framework being aware of it (one way > partition), the framework should explicitly issue a {{reconnect}} request via > the scheduler library after a certain period of time. > *On the other hand*, should we close TCP socket on master side when teardown > a framework? Currently the tcp socket is left alive even framework has been > deactivated. This results in framework sending invalid {{Call}} to master and > re-detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)