[
https://issues.apache.org/jira/browse/HBASE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037791#comment-13037791
]
dhruba borthakur commented on HBASE-3833:
-----------------------------------------
There are actually two use-cases that triggered this JIRA.
1. There are times when the adminstrator wants to shut down a few region
servers, usually to upgrade hardware or some such reason. In this case, the
administrator can put these machines in the excludes list, wait for those
region servers to gracefully shutdown.
2. The other use-case is when certain region servers become unresponsive. Twice
it so happened that a regionserver is heartbeating with ZK, but its capability
to process hbase workload suddenly fell to almost zero. We could not ssh into
the machine to debug what is wrong. (The suspicion is that the machine started
swapping). In this case, it would be nice if the administrator had an option to
put a machine in the excludes list, wait for a few minutes for it to gracefully
exit, but if it still does not exit, then forcefully declare the regionserver
as "dead".
In short, maybe we need a "force" option in decommissioning, which if used,
does not wait for a graceful shutdown of the specified regionserver, instead
declares it dead immediately and then follows the normal course of action
(lease recovery, reassign regions, etc)
> ability to support includes/excludes list in Hbase
> --------------------------------------------------
>
> Key: HBASE-3833
> URL: https://issues.apache.org/jira/browse/HBASE-3833
> Project: HBase
> Issue Type: Improvement
> Components: client, regionserver
> Affects Versions: 0.90.2
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: excl-patch.txt, excl-patch.txt
>
>
> An HBase cluster currently does not have the ability to specify that the
> master should accept regionservers only from a specified list. This helps
> preventing administrative errors where the same machine could be included in
> two clusters. It also allows the administrator to easily remove un-ssh-able
> machines from the cluster.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira