[jira] [Commented] (MESOS-2679) Slave asked to shut down by master because 'health check timed out'

Littlestar (JIRA) Mon, 04 May 2015 03:26:24 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526506#comment-14526506
 ]


Littlestar commented on MESOS-2679:
-----------------------------------

thanks.

May be it caused by "network issues". Disk IO busy also cause 'health check 
timed out' more than 10 seconds, spark java gc....
I will do more tests on spark 1.3.1 + 0.22.1.

but I think mesos slave need "a watchdog process", "network issues" is normal 
problem is product envrionment., so I post new MESOS-2685




> Slave asked to shut down by master because 'health check timed out'
> -------------------------------------------------------------------
>
>                 Key: MESOS-2679
>                 URL: https://issues.apache.org/jira/browse/MESOS-2679
>             Project: Mesos
>          Issue Type: Bug
>          Components: isolation
>    Affects Versions: 0.22.1
>            Reporter: Littlestar
>
> I run spark 1.3.1 on mesos 0.22.1 rc6 (linux64), some mesos slave node 
> offline.....
> slave node logs:
> I0430 15:12:12.737057 32354 slave.cpp:571] Slave asked to shut down by 
> [email protected]:5050 because 'health check timed out'
> master node logs:
> I0430 15:12:00.615777 19759 master.cpp:237] Shutting down slave 
> 20150430-141442-1214949568-5050-19747-S2 due to health check timeout
> W0430 15:12:00.616083 19751 master.cpp:3417] Shutting down slave 
> 20150430-141442-1214949568-5050-19747-S2 at slave(1)@192.168.1.15:5051 
> (hpblade05) with message 'health check timed out'
> why master-slave offline and not restart itself? 
> Any configurations to increase this timeout interval?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2679) Slave asked to shut down by master because 'health check timed out'

Reply via email to