[ https://issues.apache.org/jira/browse/HBASE-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806562#action_12806562 ]
Kannan Muthukkaruppan commented on HBASE-2174: ---------------------------------------------- To fill in some more details, this is what we think happened during DNS flakiness: A regionServer periodically sends a regionServerReport (RPC call) to the master. A HServerInfo argument is passed as an argument and it identifies the sending region server's identity in IP address format. The master, in ServerManager class, maintains a serversToServerInfo map which is hostname based. Every time a master receives a regionServerReport it converts the IP address based name to a hostname via the info.getServerName() call. Normally this call returns the hostname, but we suspect that during the DNS flakiness, it returned an IP address based string. And so, this caused ServerManager.java to think that it was hearing from a new server. And this lead to: HServerInfo storedInfo = serversToServerInfo.get(info.getServerName()); if (storedInfo == null) { if (LOG.isDebugEnabled()) { LOG.debug("Received report from unknown server -- telling it " + <<============ "to " + CALL_SERVER_STARTUP + ": " + info.getServerName()); <<============ } and bad things happened down the road (such as the region server registering itself multiple times in Zookeeper, cluster coming down, etc.). The above error message in our logs (example below) indeed identified the host in IP address syntax, even though normally the getServerName call would return the info in hostname format. 2010-01-28 11:21:34,539 DEBUG org.apache.hadoop.hbase.master.ServerManager: Received report from unknown server -- telling it to MSG_CALL_SERVER_STARTUP: 10.129.68.203,60020,1263605543210 -- Perhaps all we need to do is to change the ServerManager's internal maps to all be IP based? That way we avoid/bypass the master having to look up the hostname on every heartbeat. > Review how we handle addresses in HBase > --------------------------------------- > > Key: HBASE-2174 > URL: https://issues.apache.org/jira/browse/HBASE-2174 > Project: Hadoop HBase > Issue Type: Improvement > Reporter: Jean-Daniel Cryans > Fix For: 0.21.0 > > > Over the time many parts of the code have evolved in different ways and one > issue is that addresses are handled differently in different parts of the > code. We need to set a standard and correct any inconsistencies. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.