[
https://issues.apache.org/jira/browse/HBASE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871433#action_12871433
]
HBase Review Board commented on HBASE-2599:
-------------------------------------------
Message from: "Jean-Daniel Cryans" <[email protected]>
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/88/#review69
-----------------------------------------------------------
I tried it, I think there's some other places we need to review the HSA stuff,
see these lines I picked from a log:
2010-05-25 16:26:08,555 INFO org.apache.hadoop.hbase.master.ServerManager:
Received start message from: hbasedev,60020,1274829968544
2010-05-25 16:26:08,561 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Updated ZNode
/hbase/rs/1274829968544 with data 127.0.0.1:60020
2010-05-25 16:26:31,712 INFO org.apache.hadoop.hbase.master.ServerManager:
Processing MSG_REPORT_OPEN: -ROOT-,,0 from hbasedev,60020,1274829968544; 1 of 1
2010-05-25 16:26:31,726 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: SetData of ZNode
/hbase/root-region-server with 127.0.0.1:60020
2010-05-25 16:26:31,727 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {server: 127.0.0.1:60020,
regionname: -ROOT-,,0, startKey: <>}
2010-05-25 16:26:32,742 INFO
org.apache.hadoop.hbase.master.RegionServerOperation: .META.,,1 open on
hbasedev,60020,1274829968544
Basically there's a very nice mix of IPs and hostnames.
- Jean-Daniel
> BaseScanner says "Current assignment of X is not valid" over and over for
> same region
> -------------------------------------------------------------------------------------
>
> Key: HBASE-2599
> URL: https://issues.apache.org/jira/browse/HBASE-2599
> Project: HBase
> Issue Type: Bug
> Reporter: stack
>
> From IRC today
> {code}
> 12:41 < cmorgan> hey guys. I'm having a recent issue with a single node
> cluster running 0.20.4. After stopping for a backup I now get region
> assignment churn. Seems master keeps thinking that region
> assignment is not valid even when it is. Following is a log
> snippet:
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [ HMaster] DEBUG
> ter.RegionServerOperationQueue - Processing todo: PendingOpenOperation from
> localhost.,7802,1274425405680
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [ HMaster] INFO
> e.master.RegionServerOperation -
> net_troove_coin_account_AccountCredentials,,1234913258116 open on
> 127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [ HMaster] INFO
> e.master.RegionServerOperation - Updated row
> net_troove_coin_account_AccountCredentials,,1234913258116 in region .META.,,1
> with
> startcode=1274425405680, server=127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [ HMaster] DEBUG
> ter.RegionServerOperationQueue - Processing todo: PendingOpenOperation from
> localhost.,7802,1274425405680
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [ HMaster] INFO
> e.master.RegionServerOperation -
> net_troove_application_request_TemporaryRequest,,1234913268355 open on
> 127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443247 [ HMaster] INFO
> e.master.RegionServerOperation - Updated row
> net_troove_application_request_TemporaryRequest,,1234913268355 in region
> .META.,,1 with
> startcode=1274425405680, server=127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443247 [ger.metaScanner] DEBUG
> adoop.hbase.master.BaseScanner - Current assignment of
> net_troove_coin_account_AccountEntry,,1271448856984 is not valid;
> serverAddress=127.0.0.1:7802, startCode=1274425405680
> unknown.
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443248 [ger.metaScanner] DEBUG
> adoop.hbase.master.BaseScanner - Current assignment of
> net_troove_coin_account_AccountEntry-Base_EntryDay_DESCENDING,,1273266418876
> is not valid; serverAddress=127.0.0.1:7802,
> startCode=1274425405680 unknown.
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443251 [ger.metaScanner] DEBUG
> adoop.hbase.master.BaseScanner - Current assignment of
> net_troove_coin_bank_BankStatement,,1266433980935 is not valid;
> serverAddress=127.0.0.1:7802, startCode=1274425405680
> unknown.
> 12:58 < cmorgan> stack: I'd been running with 0.20.4 for a week or so
> starting/stopping every night. Now this happens...
> 14:11 < cmorgan> stack: some more info: On our mini production server the
> regionserver is getting "My address is localhost.:7802" (notice the dot after
> localhost). But the master is also sometimes
> referring to it as 127.0.0.1. I just used the same data and
> config on my laptop, and its binding to my external LAN ip ("My address is
> 10.0.1.4:7802"). Under this setup hbase comes up
> stable (no region assignment churn).
> {code}
> Looking at this, I think issue is that when we register a server we use a
> getServerName on a HServerInfo provided by the regionserver (though we are on
> the master side) but BaseScanner uses a getServerName that is made by doing a
> dns lookup using the IP that it finds in the server column of .META. My
> sense is that is possible for the regionserver hostname and what the master
> finds when it does a lookup against dns can disagree, fatally.
> This issue seems popular over last few weeks. Was reported at least once
> more on a standalone instance and also on krispykola's 15-node ec2 cluster
> (He went back to 0.20.3 and then it went away?). It made for what looked
> like double-assignment in his case (Our attempt at caching DNS names may be
> amiss -- I tihnk tht the main diff between 0.20.3 and 0.20.4 in this area).
> My thought is to purge DNS from the HServerInfo passed by the RS to Master on
> startup and heartbeating and to use IPs only (and even then, the IP that the
> master tells the RS to use, its remote address as seen by the master). We
> might have to do this fix for 0.20.5 since it seems to happen more in 0.20.4.
> I'm looking into this. Opinions welcome.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.