[
https://issues.apache.org/jira/browse/MESOS-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526506#comment-14526506
]
Littlestar commented on MESOS-2679:
-----------------------------------
thanks.
May be it caused by "network issues". Disk IO busy also cause 'health check
timed out' more than 10 seconds, spark java gc....
I will do more tests on spark 1.3.1 + 0.22.1.
but I think mesos slave need "a watchdog process", "network issues" is normal
problem is product envrionment., so I post new MESOS-2685
> Slave asked to shut down by master because 'health check timed out'
> -------------------------------------------------------------------
>
> Key: MESOS-2679
> URL: https://issues.apache.org/jira/browse/MESOS-2679
> Project: Mesos
> Issue Type: Bug
> Components: isolation
> Affects Versions: 0.22.1
> Reporter: Littlestar
>
> I run spark 1.3.1 on mesos 0.22.1 rc6 (linux64), some mesos slave node
> offline.....
> slave node logs:
> I0430 15:12:12.737057 32354 slave.cpp:571] Slave asked to shut down by
> [email protected]:5050 because 'health check timed out'
> master node logs:
> I0430 15:12:00.615777 19759 master.cpp:237] Shutting down slave
> 20150430-141442-1214949568-5050-19747-S2 due to health check timeout
> W0430 15:12:00.616083 19751 master.cpp:3417] Shutting down slave
> 20150430-141442-1214949568-5050-19747-S2 at slave(1)@192.168.1.15:5051
> (hpblade05) with message 'health check timed out'
> why master-slave offline and not restart itself?
> Any configurations to increase this timeout interval?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)