Sean Busbey created ACCUMULO-2976:
-------------------------------------
Summary: blacklist problematic tservers
Key: ACCUMULO-2976
URL: https://issues.apache.org/jira/browse/ACCUMULO-2976
Project: Accumulo
Issue Type: Improvement
Components: master
Reporter: Sean Busbey
Priority: Minor
It would be nice if the master kept track of tservers that misbehave and
eventually blacklisted them, similar to how HDFS handles datanodes and
MapReduce/YARN handle trackers.
Right now the closest we do is having the Master killing the zoolock for
tservers that are behaving poorly. This causes them to exit if they're not in a
zombie state.
On deployments with a watchdog that relaunches failed processes, this doesn't
help much because the tserver comes back. In the case of i.e. flakey network
failures for the node this just means repeating the process and impacting
cluster performance while the master works out that it should kill the node
again.
--
This message was sent by Atlassian JIRA
(v6.2#6252)