[ https://issues.apache.org/jira/browse/HBASE-25130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274369#comment-17274369 ]
Rahul Kumar commented on HBASE-25130: ------------------------------------- While trying to clear overlapped region entries from _serverHoldings_ map on running hbck repair. I noticed the HRI that _regionAssignments_(_TreeMap<HRegionInfo, ServerName>_) has an entry of does not equal to the HRI that has to be offlined. For eg *{ENCODED => fa18b66587f8f7a1de791ffefe364a48, NAME => 'test,,1611911426615.fa18b66587f8f7a1de791ffefe364a48.', STARTKEY => '', ENDKEY => ''}* is the _HRI_ metadata in _regionAssignment_ map where as *{ENCODED => fa18b66587f8f7a1de791ffefe364a48, NAME => 'test,,1611911426615.fa18b66587f8f7a1de791ffefe364a48.', STARTKEY => '', ENDKEY => '', OFFLINE => true, SPLIT => true}* is the _HRI_ metadata which has to go offline. So, while it tries to remove the HRI entry to be offline from _regionAssignment_ via _regionAssignment.remove(hri_), it was not able to find any and thus couldn't go ahead with _regionOffline_ operation further. I am confused here, as both _HRI_ objects i.e the one which has to go offline and the one in _regionAssignments_ map should point to same object? A random thought, do we need to update the logic of equals(on basis of _encodedRegionName_) for HRI so that both of the above considered as equals ? [~vjasani] [~apurtell] Can you please help. Thanks Btw, I reproed the overlap scenario via adding a bug in split and rollback scenario if that matters anyway. > Masters in-memory serverHoldings map is not cleared during hbck repair > ---------------------------------------------------------------------- > > Key: HBASE-25130 > URL: https://issues.apache.org/jira/browse/HBASE-25130 > Project: HBase > Issue Type: Bug > Reporter: Sandeep Guggilam > Assignee: Rahul Kumar > Priority: Major > > {color:#1d1c1d}Incase of repairing overlaps, hbck essentially calls the > closeRegion RPC on RS followed by offline RPC on Master to offline all the > overlap regions that would be merged into a new region. {color} > {color:#1d1c1d}However the offline RPC doesn’t remove it from the > serverHoldings map unless the new state is MERGED/SPLIT > ([https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java#L719]) > b{color}{color:#1d1c1d}ut the new state in this case is OFFLINE. {color} > {color:#1d1c1d}This is actually intended to match with the META entries and > would be removed later when the region is online on a different server. > However, in our case , the region would never be online on a new server, > hence the region info is never cleared from the map that is used by balancer > and SCP for incorrect reeassignment.{color} > {color:#1d1c1d}We might need to tackle this by removing the entries from the > map when hbck actually deletes{color}{color:#1d1c1d} the meta entries for > this region which kind of matches the in-memory map’s expectation with the > META state.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)