caroliney14 commented on pull request #2769:
URL: https://github.com/apache/hbase/pull/2769#issuecomment-745705944


   @Apache9 At the time of `reportForDuty`, there is no way to know whether RS 
is ready to take regions. After Master acknowledges the `reportForDuty` and 
sends the response back to the RS, RS performs all the actions within [this 
handleReportForDuty() 
method](https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L1494),
 including creating an ephemeral znode, initializing file system, setting up 
WAL and replication, setting up metrics, starting service threads, starting 
heap memory manager, and setting the RS internal `online` boolean to true, 
among other tasks. So, it seems that RS is doing a lot of vital and 
time-consuming setup *after* reporting for duty to Master.
   
   The flow is as follows:
   RS starts up -> RS sends `reportForDuty` to Master -> Master adds RS to 
Master's onlineServers list, and sends response to RS -> RS receives 
`reportForDuty` response from Master -> RS finishes setup, including setting up 
WAL and replication.
   
   The issue we observed which led to the creation of this JIRA is a ~20min 
delay in RS initializing replication sources when RS is unable to connect to 
peer clusters because of some kerberos configuration. And since Master 
considers the RS online, it tries to assign regions, which fails with 
`ServerNotRunningYetException`. 
   
   There are a few ways to address this issue, described 
[here](https://issues.apache.org/jira/browse/HBASE-25032?focusedCommentId=17241847&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17241847).
   
   The approach taken in this PR alters the handling of `reportForDuty` on the 
Master side. New flow is as follows:
   RS starts up -> RS sends reportForDuty to Master -> Master acknowledges 
reportForDuty, sends response to RS; at the same time, Master spawns thread to 
poll for RS 'online' flag (i.e. RS setup complete) -> RS receives 
'reportForDuty received' acknowledgement from Master -> RS finishes setup, sets 
its 'online' flag to true -> Master sees RS has finished setup -> Master adds 
RS to Master's onlineServers list.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to