[
https://issues.apache.org/jira/browse/ACCUMULO-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051846#comment-14051846
]
Sean Busbey commented on ACCUMULO-2976:
---------------------------------------
I think it's more consistent with ops for other projects to handle it ourselves.
> blacklist problematic tservers
> ------------------------------
>
> Key: ACCUMULO-2976
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2976
> Project: Accumulo
> Issue Type: Improvement
> Components: master
> Reporter: Sean Busbey
> Priority: Minor
>
> It would be nice if the master kept track of tservers that misbehave and
> eventually blacklisted them, similar to how HDFS handles datanodes and
> MapReduce/YARN handle trackers.
> Right now the closest we do is having the Master killing the zoolock for
> tservers that are behaving poorly. This causes them to exit if they're not in
> a zombie state.
> On deployments with a watchdog that relaunches failed processes, this doesn't
> help much because the tserver comes back. In the case of i.e. flakey network
> failures for the node this just means repeating the process and impacting
> cluster performance while the master works out that it should kill the node
> again.
--
This message was sent by Atlassian JIRA
(v6.2#6252)