[ https://issues.apache.org/jira/browse/HBASE-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832189#action_12832189 ]
stack commented on HBASE-2189: ------------------------------ +1 on patch. I tried various combinations of shutdown of servers carrying root and meta and client kept-on keeping-on. I'd say go commit it. > HCM trashes meta cache even when not needed > ------------------------------------------- > > Key: HBASE-2189 > URL: https://issues.apache.org/jira/browse/HBASE-2189 > Project: Hadoop HBase > Issue Type: Improvement > Affects Versions: 0.20.3 > Reporter: Jean-Daniel Cryans > Assignee: Jean-Daniel Cryans > Fix For: 0.20.4, 0.21.0 > > Attachments: HBASE-2189-v2.patch, HBASE-2189.patch > > > I was investigating HBASE-2175 when I saw that we are doing a lot more ROOT > lookups than needed. For example, typical output of PE seqWrite during split: > {code} > client.HConnectionManager$TableServers: Removed TestTable,,1265524229864 for > tableName=TestTable from cache because of 0000380292 > client.HConnectionManager$TableServers: locateRegionInMeta attempt 0 of 10 > failed; retrying after sleep of 1000 because: > No server address listed in .META. for region > TestTable,0000086976,1265524283534 > client.HConnectionManager$TableServers: Removed .META.,,1 for > tableName=.META. from cache because of TestTable,0000380292,99999999999999 > client.HConnectionManager$TableServers: Cached location for .META.,,1 is > 192.168.1.103:56279 > client.HConnectionManager$TableServers: locateRegionInMeta attempt 1 of 10 > failed; retrying after sleep of 1000 because: > No server address listed in .META. for region > TestTable,0000086976,1265524283534 > client.HConnectionManager$TableServers: Removed .META.,,1 for > tableName=.META. from cache because of TestTable,0000380292,99999999999999 > client.HConnectionManager$TableServers: Cached location for .META.,,1 is > 192.168.1.103:56279 > client.HConnectionManager$TableServers: Cached location for > TestTable,0000086976,1265524283534 is 192.168.1.103:56279 > {code} > So why exactly are we removing .META.,,1 from the cache? Because a row didn't > have the right address? So that means we did contact .META. but the > information we got is still stall because the split isn't finished yet... but > why should that result in trashing the cache? > Because we don't differentiate between NSRE / WRE from other exceptions like > empty server address. This happens a lot more often now that the Master > clears that cell when a region is closed instead of keeping the old value. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.