[ 
https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056802#comment-16056802
 ] 

Siddharth Seth commented on HIVE-16927:
---------------------------------------

[~prasanth_j] - I don't think we make a permanent change of this being set to 
0. A bad instance will never stop on it's own, and will keep trying to launch 
new containers.
A better default would likely be numInstances, while making sure it is not too 
low (6 is the default for example), and the value is high enough to allow a 
node to be blacklisted.
Option1: numInstances * threshold to mark a node as disabled.
Option2: max(6, max(numInstances, threshold to mark a node as disabled))
Option3: ?

An enhancement request to Slider to get better control over this 

> LLAP: Slider takes down all daemons when some daemons fail repeatedly
> ---------------------------------------------------------------------
>
>                 Key: HIVE-16927
>                 URL: https://issues.apache.org/jira/browse/HIVE-16927
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>    Affects Versions: 3.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>         Attachments: HIVE-16927.1.patch
>
>
> When some containers fail repeatedly, slider thinks application is in 
> unstable state which brings down all llap daemons. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to