[
https://issues.apache.org/jira/browse/HBASE-22254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824251#comment-16824251
]
stack commented on HBASE-22254:
-------------------------------
bq. 1) More resilient off-loading; right now off-loading fails for a subset of
regions in case of a single region failure; is never done on master restart,
etc.
Say more.
bq. 2) Option to kill RS after off-loading (good for container mode HBase, e.g.
on YARN).
Add how to invoke this new option to RN?
bq. 3) Option to specify machine names only to decommission, for the API to be
usable for an external system that doesn't care about HBase server names, or
e.g. multiple RS in containers on the same node.
Sounds good (ditto on RN).
bq. 4) Option to replace existing decommissioning list instead of adding to it
(the same; to avoid additionally remembering what was previously sent to HBase).
Sounds good.
On patch, you name the option optional bool killAfterOffload = 2;
differently in your two .proto file changes? Make same?
On DecommissionState in zk.proto, does it have to go into .proto? Can it not be
a procedure? (could be a follow-on).
Do we have to have another 'manager'? Decommissioning seems like a
ServerManager charge? ( /** Relies on servermanager from master services in
ctor. */)
This ok?
73 if (master.getZooKeeper() != null) {
74 this.parentZNode =
this.master.getZooKeeper().getZNodePaths().drainingZNode;
75 } else {
76 this.parentZNode = null; // Test path.
77 }
We could construct the manager w/o znode? Or is it that this can happen in
tests or something? Also, in start, you do the master.getZooKeeper() is null
check again. Maybe init your znode there so only do it once?
On 'start', could it implement the Guava Service Interface? nit.
Yeah, adding extra server listener seems to be an argument that this be a
facility in ServerManager rather than new Manager.
THis has to be public public List<ServerName> getDrainingServersList() { . ?
Stopped review about 1/3rd in. I like the cleanup and test.
> refactor and improve decommissioning logic
> ------------------------------------------
>
> Key: HBASE-22254
> URL: https://issues.apache.org/jira/browse/HBASE-22254
> Project: HBase
> Issue Type: Improvement
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Major
> Attachments: HBASE-22254.01.patch, HBASE-22254.patch
>
>
> Making some changes needed to support better decommissioning on large
> clusters and with container mode; to test those and add clarify I moved parts
> of decommissioning logic from HMaster, Draining tracker, and ServerManager
> into a separate class.
> Features added/improvements:
> 1) More resilient off-loading; right now off-loading fails for a subset of
> regions in case of a single region failure; is never done on master restart,
> etc.
> 2) Option to kill RS after off-loading (good for container mode HBase, e.g.
> on YARN).
> 3) Option to specify machine names only to decommission, for the API to be
> usable for an external system that doesn't care about HBase server names, or
> e.g. multiple RS in containers on the same node.
> 4) Option to replace existing decommissioning list instead of adding to it
> (the same; to avoid additionally remembering what was previously sent to
> HBase).
> 5) Tests, comments ;)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)