David Pope created HBASE-16853:
----------------------------------

             Summary: Regions are assigned to Region Servers in /hbase/draining 
after HBase Master failover
                 Key: HBASE-16853
                 URL: https://issues.apache.org/jira/browse/HBASE-16853
             Project: HBase
          Issue Type: Bug
          Components: Region Assignment
    Affects Versions: 2.0.0, 1.3.0
            Reporter: David Pope
            Assignee: David Pope


h2. Problem
If there are Region Servers registered as "draining", they will continue to 
have "draining" znodes after a HMaster failover; however, the balancer will 
assign regions to them.

h2. How to reproduce (on hbase master):
# Add regionserver to /hbase/draining: {{bin/hbase-jruby 
bin/draining_servers.rb add server1:16205}}
# Unload the regionserver:  {{bin/hbase-jruby bin/region_mover.rb unload 
server1:16205}}
# Kill the Active HMaster and failover to the Backup HMaster
# Run the balancer: {{hbase shell <<< "balancer"}}
# Notice regions get assigned on new Active Master to Region Servers in 
/hbase/draining

h2. Root Cause
The Backup HMaster initializes the {{DrainingServerTracker}} before the Region 
Servers are registered as "online" with the {{ServerManager}}.  As a result, 
the {{ServerManager.drainingServers}} isn't populated with existing Region 
Servers in draining when we have an HMaster failover.

E.g., 
# We have a region server in draining: {{server1,16205,1000}}
# The {{RegionServerTracker}} starts up and adds a ZK watcher on the Znode for 
this RegionServer: {{/hbase/rs/server1,16205,1000}}
# The {{DrainingServerTracker}} starts and processes each Znode under 
{{/hbase/draining}}, but the Region Server isn't registered as "online" so it 
isn't added to the {{ServerManager.drainingServers}} list.
# The Region Server is added to the {{DrainingServerTracker.drainingServers}} 
list.
# The Region Server's Znode watcher is triggered and the ZK watcher is 
restarted.
# The Region Server is registered with {{ServerManager}} as "online".

*END STATE:* The Region Server has a Znode in {{/hbase/draining}}, but it is 
registered as "online" and the Balancer will start assigning regions to it.

{code}
$ bin/hbase-jruby bin/draining_servers.rb list
[1] server1,16205,1000

$ grep server1,16205,1000 logs/master-server1.log
2016-10-14 16:02:47,713 DEBUG [server1:16001.activeMasterManager] 
zookeeper.ZKUtil: master:16001-0x157c56adc810014, quorum=localhost:2181, 
baseZNode=/hbase Set watcher on existing znode=/hbase/rs/server1,16205,1000

[2] 2016-10-14 16:02:47,722 DEBUG [server1:16001.activeMasterManager] 
zookeeper.RegionServerTracker: Added tracking of RS /hbase/rs/server1,16205,1000

2016-10-14 16:02:47,730 DEBUG [server1:16001.activeMasterManager] 
zookeeper.ZKUtil: master:16001-0x157c56adc810014, quorum=localhost:2181, 
baseZNode=/hbase Set watcher on existing 
znode=/hbase/draining/server1,16205,1000

[3] 2016-10-14 16:02:47,731 WARN  [server1:16001.activeMasterManager] 
master.ServerManager: Server server1,16205,1000 is not currently online. 
Ignoring request to add it to draining list.

[4] 2016-10-14 16:02:47,731 INFO  [server1:16001.activeMasterManager] 
zookeeper.DrainingServerTracker: Draining RS node created, adding to list 
[server1,16205,1000]

2016-10-14 16:02:47,971 DEBUG [main-EventThread] zookeeper.ZKUtil: 
master:16001-0x157c56adc810014, quorum=localhost:2181, baseZNode=/hbase Set 
watcher on existing 
znode=/hbase/rs/dev6918.prn2.facebook.com,16205,1476486047114

[5] 2016-10-14 16:02:47,976 DEBUG [main-EventThread] 
zookeeper.RegionServerTracker: Added tracking of RS /hbase/rs/server1,16205,1000

[6] 2016-10-14 16:02:52,084 INFO  
[RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=16001] 
master.ServerManager: Registering server=server1,16205,1000
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to