virajjasani commented on code in PR #5774:
URL: https://github.com/apache/hbase/pull/5774#discussion_r1536979403


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java:
##########
@@ -324,8 +324,19 @@ public void regionServerReport(ServerName sn, 
ServerMetrics sl) throws YouAreDea
       // the ServerName to use. Here we presume a master has already done
       // that so we'll press on with whatever it gave us for ServerName.
       if (!checkAndRecordNewServer(sn, sl)) {
-        LOG.info("RegionServerReport ignored, could not record the server: " + 
sn);
-        return; // Not recorded, so no need to move on
+        // Master already registered server with same (host + port) and higher 
startcode.

Review Comment:
   When it happened (as per logs mentioned on the jira), master processed the 
report and that generated inconsistencies.
   
   We have seen this happen many times in the past when regionserver is not 
really aborted but looses connection with Zookeeper, triggering SCP by master. 
And regionserver with new startcode is not only alive but has also reported 
regionservers to master. After that, somehow master still receives regionserver 
report, master processes it and that results into inconsistencies. I know this 
is rare case but it definitely happened more than once in more than one prod 
clusters.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to