[ 
https://issues.apache.org/jira/browse/HBASE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871953#action_12871953
 ] 

HBase Review Board commented on HBASE-2599:
-------------------------------------------

Message from: "Todd Lipcon" <[email protected]>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/88/#review81
-----------------------------------------------------------

Ship it!


this seems reasonable, but only way we'll know is testing on various systems 
with different kinds of DNS setups. Can we get those who have had issues to try 
this and make sure it fixes, before we release it?


branches/0.20/src/java/org/apache/hadoop/hbase/ClusterStatus.java
<http://review.hbase.org/r/88/#comment409>

    I don't like that this function is called getServerNames and returns 
hostnameports.



branches/0.20/src/test/org/apache/hadoop/hbase/TestServerInfo.java
<http://review.hbase.org/r/88/#comment410>

    using the junit4 expected thing here doesn't quite work, cuz we don't know 
that it didn't throw on line 34, right?


- Todd


On 2010-05-25 16:04:24, stack wrote:
^bq.  
^bq.  -----------------------------------------------------------
^bq.  This is an automatically generated e-mail. To reply, visit:
^bq.  http://review.hbase.org/r/88/
^bq.  -----------------------------------------------------------
^bq.  
^bq.  (Updated 2010-05-25 16:04:24)
^bq.  
^bq.  
^bq.  Review request for hbase.
^bq.  
^bq.  
^bq.  Summary
^bq.  -------
^bq.  
^bq.  Version of hbase-2599 for 0.20 branch for 0.20.5.
^bq.  
^bq.  It cuts DNS lookups from HServerInfo and uses the regionservers name 
everywhere instead of ip sometimes and a lookedup hostname at other times.  
Also puts hostname:port into .META. rather than IP.  Removing DNS lookups makes 
it so there is no possibility for disagreement over hostname if master gets one 
answer for an IP and the regionserver has another.
^bq.  
^bq.  Other notes:
^bq.  
^bq.  It replaces variable servername with hostnameandport so its clear whats 
going on.
^bq.  
^bq.  Does some cleanup in HServerInfo renaming data member 'name' as 
'hostname' and then purges DNS caching and lookups from this class.  It removes 
some unnecessary setters and does some javadoc fix ups.
^bq.  
^bq.  Removes the dumb serverAddressToServerInfo from ServerManager.
^bq.  
^bq.  
^bq.  This addresses bug hbase-2599.
^bq.  
^bq.  
^bq.  Diffs
^bq.  -----
^bq.  
^bq.    branches/0.20/src/java/org/apache/hadoop/hbase/ClusterStatus.java 
948218 
^bq.    branches/0.20/src/java/org/apache/hadoop/hbase/HServerInfo.java 948218 
^bq.    branches/0.20/src/java/org/apache/hadoop/hbase/master/BaseScanner.java 
948218 
^bq.    branches/0.20/src/java/org/apache/hadoop/hbase/master/HMaster.java 
948218 
^bq.    
branches/0.20/src/java/org/apache/hadoop/hbase/master/ProcessRegionOpen.java 
948218 
^bq.    
branches/0.20/src/java/org/apache/hadoop/hbase/master/ServerManager.java 948218 
^bq.    
branches/0.20/src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
948218 
^bq.    branches/0.20/src/test/org/apache/hadoop/hbase/TestServerInfo.java 
PRE-CREATION 
^bq.    branches/0.20/src/webapps/master/table.jsp 948218 
^bq.  
^bq.  Diff: http://review.hbase.org/r/88/diff
^bq.  
^bq.  
^bq.  Testing
^bq.  -------
^bq.  
^bq.  Doing now.
^bq.  
^bq.  
^bq.  Thanks,
^bq.  
^bq.  stack
^bq.  
^bq. 




> BaseScanner says "Current assignment of X is not valid" over and over for 
> same region
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-2599
>                 URL: https://issues.apache.org/jira/browse/HBASE-2599
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: 2599-0.20.txt
>
>
> From IRC today
> {code}
> 12:41 < cmorgan> hey guys. I'm having a recent  issue with a single node 
> cluster running 0.20.4. After stopping for a backup I now get region 
> assignment churn. Seems master keeps thinking that region
>                  assignment is not valid even when it is. Following is a log 
> snippet:
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [        HMaster] DEBUG 
> ter.RegionServerOperationQueue  - Processing todo: PendingOpenOperation from 
> localhost.,7802,1274425405680
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [        HMaster] INFO  
> e.master.RegionServerOperation  - 
> net_troove_coin_account_AccountCredentials,,1234913258116 open on 
> 127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [        HMaster] INFO  
> e.master.RegionServerOperation  - Updated row 
> net_troove_coin_account_AccountCredentials,,1234913258116 in region .META.,,1 
> with
>                  startcode=1274425405680, server=127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [        HMaster] DEBUG 
> ter.RegionServerOperationQueue  - Processing todo: PendingOpenOperation from 
> localhost.,7802,1274425405680
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [        HMaster] INFO  
> e.master.RegionServerOperation  - 
> net_troove_application_request_TemporaryRequest,,1234913268355 open on 
> 127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443247 [        HMaster] INFO  
> e.master.RegionServerOperation  - Updated row 
> net_troove_application_request_TemporaryRequest,,1234913268355 in region 
> .META.,,1 with
>                  startcode=1274425405680, server=127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443247 [ger.metaScanner] DEBUG 
> adoop.hbase.master.BaseScanner  - Current assignment of 
> net_troove_coin_account_AccountEntry,,1271448856984 is not valid;
>                  serverAddress=127.0.0.1:7802, startCode=1274425405680 
> unknown.
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443248 [ger.metaScanner] DEBUG 
> adoop.hbase.master.BaseScanner  - Current assignment of 
> net_troove_coin_account_AccountEntry-Base_EntryDay_DESCENDING,,1273266418876
>                  is not valid;  serverAddress=127.0.0.1:7802, 
> startCode=1274425405680 unknown.
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443251 [ger.metaScanner] DEBUG 
> adoop.hbase.master.BaseScanner  - Current assignment of 
> net_troove_coin_bank_BankStatement,,1266433980935 is not valid;
>                  serverAddress=127.0.0.1:7802, startCode=1274425405680 
> unknown.
> 12:58 < cmorgan> stack: I'd been running with 0.20.4 for a week or so 
> starting/stopping every night. Now this happens...
> 14:11 < cmorgan> stack: some more info: On our mini production server the 
> regionserver is getting "My address is localhost.:7802" (notice the dot after 
> localhost). But the master is also sometimes
>                  referring to it as 127.0.0.1. I just used the same data and 
> config on my laptop, and its binding to my external LAN ip ("My address is 
> 10.0.1.4:7802"). Under this setup hbase comes up
>                  stable (no region assignment churn).
> {code}
> Looking at this, I think issue is that when we register a server we use a 
> getServerName on a HServerInfo provided by the regionserver (though we are on 
> the master side) but BaseScanner uses a getServerName that is made by doing a 
> dns lookup using the IP that it finds in the server column of .META.  My 
> sense is that is possible for the regionserver hostname and what the master 
> finds when it does a lookup against dns can disagree, fatally.
> This issue seems popular over last few weeks.  Was reported at least once 
> more on a standalone instance and also on krispykola's 15-node ec2 cluster 
> (He went back to 0.20.3 and then it went away?).  It made for what looked 
> like double-assignment in his case (Our attempt at caching DNS names may be 
> amiss -- I tihnk tht the main diff between 0.20.3 and 0.20.4 in this area).
> My thought is to purge DNS from the HServerInfo passed by the RS to Master on 
> startup and heartbeating and to use IPs only (and even then, the IP that the 
> master tells the RS to use, its remote address as seen by the master).  We 
> might have to do this fix for 0.20.5 since it seems to happen more in 0.20.4.
> I'm looking into this.  Opinions welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to