[
https://issues.apache.org/jira/browse/SLIDER-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186065#comment-16186065
]
Billie Rinaldi commented on SLIDER-1246:
----------------------------------------
[~gsaha], thanks for the new patch! I think we can still clean up the the
global/final config handling in scheduleHealthThresholdMonitor (this is not
needed because global properties have already been propagated to the component
properties … so if the property exists in global and not in component, it will
have been copied to the component).
Secondly, I realized this morning that there will be an issue if unique
component names is enabled. When unique component names are enabled, there is a
separate ProviderRole and RoleStatus for each instance (solr1, solr2, etc.) and
the desired count for each is 1 (or 0), so the desired count for the role group
can’t be obtained from the RoleStatus.
If you have an app or unit test that you are using for testing, I would
recommend running the same test with and without unique component names
enabled. I would expect there to be the same behavior for both.
> Application health should not be affected by faulty nodes
> ---------------------------------------------------------
>
> Key: SLIDER-1246
> URL: https://issues.apache.org/jira/browse/SLIDER-1246
> Project: Slider
> Issue Type: Bug
> Components: appmaster, core
> Affects Versions: Slider 0.92
> Reporter: Prasanth Jayachandran
> Assignee: Gour Saha
> Fix For: Slider 1.0.0
>
> Attachments: SLIDER-1246.01.patch, SLIDER-1246.02.patch,
> SLIDER-1246.03.patch
>
>
> In case of a faulty node, multiple container failures will be deemed as an
> application failure.
> Observed this in HIVE-16927, where container failures in certain nodes brings
> down entire application. Slider has to provide a way to not mark application
> as unhealthy if certain threshold of containers are running. Tuning failure
> threshold is not optimal as setting the correct default on large cluster is
> not trivial. Beyond certain failures, slider should mark the node as
> unhealthy and report that back to client/AM. Application could continue to
> run as long as container request is satisfied partially (example: 80%
> containers are running).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)