Hi all
We encountered a strange scenario in our Hbase cluster ( based on 1.0 branch).  
Scenario is like below
There is a table (t1) with region (r1) in disabled state. Region r1 was last 
assigned to Region Server (RS1).  For some duration in between, Network 
communication got broken between HMaster (HM1) and RegionServer (RS1).
In this duration , when user tries to enable table t1, it failed.  This 
happened because the region r1 couldn't assign to any of the live RS. The 
assignment got skipped form forceRegionStateToOffline() method in 
AssignmentMaanger due to below check

if (useZKForAssignment
          && regionStates.isServerDeadAndNotProcessed(sn)
          && wasRegionOnDeadServerByMeta(region, sn)) {
              }

We found that the method regionStates.isServerDeadAndNotProcessed(sn), will put 
the RS1 in its deadServers and wait for SSH to process the RS1 which never 
happens as session between RS1 and ZK is still fine.

synchronized boolean isServerDeadAndNotProcessed(ServerName server) {

               -----
        if (serverManager.isServerReachable(server)) {
          return false;
        }
        // The size of deadServers won't grow unbounded.
        deadServers.put(hostAndPort, Long.valueOf(startCode));
      }
      // Watch out! If the server is not dead, the region could
      // remain unassigned. That's why ServerManager#isServerReachable
      // should use some retry.

               -----

Even though Network recovered after some time, The table could not be enabled 
after that. Its due to

a)  deadServers never removes the entry of RS1

b)  Even though entry from deadServers is removed OR RS is aborted, the table 
cannot be enabled as its already in EANBLING state. Only when Master failover 
happens, the table gets enabled.

Similar scenario is also discussed in below JIRAs
https://issues.apache.org/jira/browse/HBASE-9514?focusedCommentId=13769761&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13769761
https://issues.apache.org/jira/browse/HBASE-6469


Let us know how to handle this scenario OR any other mechanism.

Thanks
Bhupendra


--------------------------------------------------------------------------------------------------------
This e-mail and its attachments contain confidential information from HUAWEI, 
which
is intended only for the person or entity whose address is listed above. Any 
use of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender by
phone or email immediately and delete it!

Reply via email to