draining after HBase Master failover

Hudson (JIRA) Mon, 14 Nov 2016 14:05:54 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-16853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665193#comment-15665193
 ]


Hudson commented on HBASE-16853:
--------------------------------

ABORTED: Integrated in Jenkins build HBase-0.98-matrix #415 (See 
[https://builds.apache.org/job/HBase-0.98-matrix/415/])
HBASE-16853 Regions are assigned to Region Servers in /hbase/draining 
(apurtell: rev dba43b62823cbaa663cf0c2f7b7e4dcd668bdbce)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentListener.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/DrainingServerTracker.java


> Regions are assigned to Region Servers in /hbase/draining after HBase Master 
> failover
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-16853
>                 URL: https://issues.apache.org/jira/browse/HBASE-16853
>             Project: HBase
>          Issue Type: Bug
>          Components: Balancer, Region Assignment
>    Affects Versions: 2.0.0, 1.3.0
>            Reporter: David Pope
>            Assignee: David Pope
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 0.98.24
>
>         Attachments: 16853.v2.txt, HBASE-16853.branch-1.3-v1.patch, 
> HBASE-16853.branch-1.3-v2.patch
>
>
> h2. Problem
> If there are Region Servers registered as "draining", they will continue to 
> have "draining" znodes after a HMaster failover; however, the balancer will 
> assign regions to them.
> h2. How to reproduce (on hbase master):
> # Add regionserver to /hbase/draining: {{bin/hbase-jruby 
> bin/draining_servers.rb add server1:16205}}
> # Unload the regionserver:  {{bin/hbase-jruby bin/region_mover.rb unload 
> server1:16205}}
> # Kill the Active HMaster and failover to the Backup HMaster
> # Run the balancer: {{hbase shell <<< "balancer"}}
> # Notice regions get assigned on new Active Master to Region Servers in 
> /hbase/draining
> h2. Root Cause
> The Backup HMaster initializes the {{DrainingServerTracker}} before the 
> Region Servers are registered as "online" with the {{ServerManager}}.  As a 
> result, the {{ServerManager.drainingServers}} isn't populated with existing 
> Region Servers in draining when we have an HMaster failover.
> E.g., 
> # We have a region server in draining: {{server1,16205,1000}}
> # The {{RegionServerTracker}} starts up and adds a ZK watcher on the Znode 
> for this RegionServer: {{/hbase/rs/server1,16205,1000}}
> # The {{DrainingServerTracker}} starts and processes each Znode under 
> {{/hbase/draining}}, but the Region Server isn't registered as "online" so it 
> isn't added to the {{ServerManager.drainingServers}} list.
> # The Region Server is added to the {{DrainingServerTracker.drainingServers}} 
> list.
> # The Region Server's Znode watcher is triggered and the ZK watcher is 
> restarted.
> # The Region Server is registered with {{ServerManager}} as "online".
> *END STATE:* The Region Server has a Znode in {{/hbase/draining}}, but it is 
> registered as "online" and the Balancer will start assigning regions to it.
> {code}
> $ bin/hbase-jruby bin/draining_servers.rb list
> [1] server1,16205,1000
> $ grep server1,16205,1000 logs/master-server1.log
> 2016-10-14 16:02:47,713 DEBUG [server1:16001.activeMasterManager] 
> zookeeper.ZKUtil: master:16001-0x157c56adc810014, quorum=localhost:2181, 
> baseZNode=/hbase Set watcher on existing znode=/hbase/rs/server1,16205,1000
> [2] 2016-10-14 16:02:47,722 DEBUG [server1:16001.activeMasterManager] 
> zookeeper.RegionServerTracker: Added tracking of RS 
> /hbase/rs/server1,16205,1000
> 2016-10-14 16:02:47,730 DEBUG [server1:16001.activeMasterManager] 
> zookeeper.ZKUtil: master:16001-0x157c56adc810014, quorum=localhost:2181, 
> baseZNode=/hbase Set watcher on existing 
> znode=/hbase/draining/server1,16205,1000
> [3] 2016-10-14 16:02:47,731 WARN  [server1:16001.activeMasterManager] 
> master.ServerManager: Server server1,16205,1000 is not currently online. 
> Ignoring request to add it to draining list.
> [4] 2016-10-14 16:02:47,731 INFO  [server1:16001.activeMasterManager] 
> zookeeper.DrainingServerTracker: Draining RS node created, adding to list 
> [server1,16205,1000]
> 2016-10-14 16:02:47,971 DEBUG [main-EventThread] zookeeper.ZKUtil: 
> master:16001-0x157c56adc810014, quorum=localhost:2181, baseZNode=/hbase Set 
> watcher on existing 
> znode=/hbase/rs/dev6918.prn2.facebook.com,16205,1476486047114
> [5] 2016-10-14 16:02:47,976 DEBUG [main-EventThread] 
> zookeeper.RegionServerTracker: Added tracking of RS 
> /hbase/rs/server1,16205,1000
> [6] 2016-10-14 16:02:52,084 INFO  
> [RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=16001] 
> master.ServerManager: Registering server=server1,16205,1000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16853) Regions are assigned to Region Servers in /hbase/draining after HBase Master failover

Reply via email to