[
https://issues.apache.org/jira/browse/HIVE-16927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056802#comment-16056802
]
Siddharth Seth commented on HIVE-16927:
---------------------------------------
[~prasanth_j] - I don't think we make a permanent change of this being set to
0. A bad instance will never stop on it's own, and will keep trying to launch
new containers.
A better default would likely be numInstances, while making sure it is not too
low (6 is the default for example), and the value is high enough to allow a
node to be blacklisted.
Option1: numInstances * threshold to mark a node as disabled.
Option2: max(6, max(numInstances, threshold to mark a node as disabled))
Option3: ?
An enhancement request to Slider to get better control over this
> LLAP: Slider takes down all daemons when some daemons fail repeatedly
> ---------------------------------------------------------------------
>
> Key: HIVE-16927
> URL: https://issues.apache.org/jira/browse/HIVE-16927
> Project: Hive
> Issue Type: Bug
> Components: llap
> Affects Versions: 3.0.0
> Reporter: Prasanth Jayachandran
> Assignee: Prasanth Jayachandran
> Attachments: HIVE-16927.1.patch
>
>
> When some containers fail repeatedly, slider thinks application is in
> unstable state which brings down all llap daemons.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)