[ https://issues.apache.org/jira/browse/HBASE-25032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341087#comment-17341087 ]
Andrew Kyle Purtell edited comment on HBASE-25032 at 5/8/21, 12:50 AM: ----------------------------------------------------------------------- We are reverting this change from master, branch-1, branch-2, branch-2.3, and branch-2.4 due to HBASE-25774. This change can go back in after the issues are addressed. This was released in 2.3.5 but 2.3.5 will be withdrawn and replaced with 2.3.5.1, which will contain just a revert of this commit. See discussion on HBASE-25774. was (Author: apurtell): We are reverting this change from master, branch-2, branch-2.3, and branch-2.4 due to HBASE-25774. This change can go back in after the issues are addressed. This was released in 2.3.5 but 2.3.5 will be withdrawn and replaced with 2.3.5.1, which will contain just a revert of this commit. See discussion on HBASE-25774. > Wait for region server to become online before adding it to online servers in > Master > ------------------------------------------------------------------------------------ > > Key: HBASE-25032 > URL: https://issues.apache.org/jira/browse/HBASE-25032 > Project: HBase > Issue Type: Bug > Reporter: Sandeep Guggilam > Assignee: Caroline Zhou > Priority: Major > Labels: master, regionserver > Fix For: 3.0.0-alpha-1, 2.5.0 > > > As part of RS start up, RS reports for duty to Master . Master acknowledges > the request and adds it to the onlineServers list for further assigning any > regions to the RS > Once Master acknowledges the reportForDuty and sends back the response, RS > does a bunch of stuff like initializing replication sources etc before > becoming online. However, sometimes there could be an issue with initializing > replication sources when it is unable to connect to peer clusters because of > some kerberos configuration and there would be a delay of around 20 mins in > becoming online. > > Since master considers it online, it tries to assign regions and which fails > with ServerNotRunningYet exception, then the master tries to unassign which > again fails with the same exception leading the region to FAILED_CLOSE state. > > It would be good to have a check to see if the RS is ready to accept the > assignment requests before adding it to online servers list which would > account for any such delays as described above -- This message was sent by Atlassian Jira (v8.3.4#803005)