[
https://issues.apache.org/jira/browse/HBASE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans updated HBASE-2599:
--------------------------------------
Attachment: HBASE-2599-0.20-refresh.patch
I still see people with this issue from time to time, and they don't all want
to upgrade to 0.89, so here's an updated version of that patch for 0.20
Also I have a 0.20.6 distro patched with it here:
http://people.apache.org/~jdcryans/hbase-0.20.6.tar.gz
> BaseScanner says "Current assignment of X is not valid" over and over for
> same region
> -------------------------------------------------------------------------------------
>
> Key: HBASE-2599
> URL: https://issues.apache.org/jira/browse/HBASE-2599
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: stack
> Fix For: 0.90.0
>
> Attachments: 2599-0.20.txt, 2599-trunk.txt,
> HBASE-2599-0.20-refresh.patch
>
>
> From IRC today
> {code}
> 12:41 < cmorgan> hey guys. I'm having a recent issue with a single node
> cluster running 0.20.4. After stopping for a backup I now get region
> assignment churn. Seems master keeps thinking that region
> assignment is not valid even when it is. Following is a log
> snippet:
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [ HMaster] DEBUG
> ter.RegionServerOperationQueue - Processing todo: PendingOpenOperation from
> localhost.,7802,1274425405680
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [ HMaster] INFO
> e.master.RegionServerOperation -
> net_troove_coin_account_AccountCredentials,,1234913258116 open on
> 127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [ HMaster] INFO
> e.master.RegionServerOperation - Updated row
> net_troove_coin_account_AccountCredentials,,1234913258116 in region .META.,,1
> with
> startcode=1274425405680, server=127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [ HMaster] DEBUG
> ter.RegionServerOperationQueue - Processing todo: PendingOpenOperation from
> localhost.,7802,1274425405680
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443246 [ HMaster] INFO
> e.master.RegionServerOperation -
> net_troove_application_request_TemporaryRequest,,1234913268355 open on
> 127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443247 [ HMaster] INFO
> e.master.RegionServerOperation - Updated row
> net_troove_application_request_TemporaryRequest,,1234913268355 in region
> .META.,,1 with
> startcode=1274425405680, server=127.0.0.1:7802
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443247 [ger.metaScanner] DEBUG
> adoop.hbase.master.BaseScanner - Current assignment of
> net_troove_coin_account_AccountEntry,,1271448856984 is not valid;
> serverAddress=127.0.0.1:7802, startCode=1274425405680
> unknown.
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443248 [ger.metaScanner] DEBUG
> adoop.hbase.master.BaseScanner - Current assignment of
> net_troove_coin_account_AccountEntry-Base_EntryDay_DESCENDING,,1273266418876
> is not valid; serverAddress=127.0.0.1:7802,
> startCode=1274425405680 unknown.
> 12:41 < cmorgan> [21/05/10 00:59:42] 3443251 [ger.metaScanner] DEBUG
> adoop.hbase.master.BaseScanner - Current assignment of
> net_troove_coin_bank_BankStatement,,1266433980935 is not valid;
> serverAddress=127.0.0.1:7802, startCode=1274425405680
> unknown.
> 12:58 < cmorgan> stack: I'd been running with 0.20.4 for a week or so
> starting/stopping every night. Now this happens...
> 14:11 < cmorgan> stack: some more info: On our mini production server the
> regionserver is getting "My address is localhost.:7802" (notice the dot after
> localhost). But the master is also sometimes
> referring to it as 127.0.0.1. I just used the same data and
> config on my laptop, and its binding to my external LAN ip ("My address is
> 10.0.1.4:7802"). Under this setup hbase comes up
> stable (no region assignment churn).
> {code}
> Looking at this, I think issue is that when we register a server we use a
> getServerName on a HServerInfo provided by the regionserver (though we are on
> the master side) but BaseScanner uses a getServerName that is made by doing a
> dns lookup using the IP that it finds in the server column of .META. My
> sense is that is possible for the regionserver hostname and what the master
> finds when it does a lookup against dns can disagree, fatally.
> This issue seems popular over last few weeks. Was reported at least once
> more on a standalone instance and also on krispykola's 15-node ec2 cluster
> (He went back to 0.20.3 and then it went away?). It made for what looked
> like double-assignment in his case (Our attempt at caching DNS names may be
> amiss -- I tihnk tht the main diff between 0.20.3 and 0.20.4 in this area).
> My thought is to purge DNS from the HServerInfo passed by the RS to Master on
> startup and heartbeating and to use IPs only (and even then, the IP that the
> master tells the RS to use, its remote address as seen by the master). We
> might have to do this fix for 0.20.5 since it seems to happen more in 0.20.4.
> I'm looking into this. Opinions welcome.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.