[ 
https://issues.apache.org/jira/browse/HBASE-25130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274369#comment-17274369
 ] 

Rahul Kumar commented on HBASE-25130:
-------------------------------------

While trying to clear overlapped region entries from _serverHoldings_ map on 
running hbck repair. I noticed the HRI that 

_regionAssignments_(_TreeMap<HRegionInfo, ServerName>_) has an entry of does 
not equal to the HRI that has to be offlined. For eg 

*{ENCODED => fa18b66587f8f7a1de791ffefe364a48, NAME => 
'test,,1611911426615.fa18b66587f8f7a1de791ffefe364a48.', STARTKEY => '', ENDKEY 
=> ''}* is the _HRI_ metadata in _regionAssignment_ map where as 

*{ENCODED => fa18b66587f8f7a1de791ffefe364a48, NAME => 
'test,,1611911426615.fa18b66587f8f7a1de791ffefe364a48.', STARTKEY => '', ENDKEY 
=> '', OFFLINE => true, SPLIT => true}* is the _HRI_ metadata which has to go 
offline. 
 So, while it tries to remove the HRI entry to be offline from 
_regionAssignment_ via _regionAssignment.remove(hri_), it was not able to find 
any and thus couldn't go ahead with _regionOffline_ operation further. 

I am confused here, as both _HRI_ objects i.e the one which has to go offline 
and the one in _regionAssignments_ map should point to same object?
A random thought, do we need to update the logic of equals(on basis of 
_encodedRegionName_) for HRI so that both of the above considered as equals ? 

[~vjasani] [~apurtell] Can you please help. Thanks

Btw, I reproed the overlap scenario via adding a bug in split and rollback 
scenario if that matters anyway.

> Masters in-memory serverHoldings map is not cleared during hbck repair
> ----------------------------------------------------------------------
>
>                 Key: HBASE-25130
>                 URL: https://issues.apache.org/jira/browse/HBASE-25130
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sandeep Guggilam
>            Assignee: Rahul Kumar
>            Priority: Major
>
> {color:#1d1c1d}Incase of repairing overlaps, hbck  essentially calls the 
> closeRegion RPC on RS followed by offline RPC on Master to offline all the 
> overlap regions that would be merged into a new region. {color}
> {color:#1d1c1d}However the offline RPC doesn’t remove it from the 
> serverHoldings map unless the new state is MERGED/SPLIT 
> ([https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java#L719])
>  b{color}{color:#1d1c1d}ut the new state in this case is OFFLINE. {color}
> {color:#1d1c1d}This is actually intended to match with the META entries and 
> would be removed later when the region is online on a different server. 
> However, in our case , the region would never be online on a new server, 
> hence the region info is never cleared from the map that is used by balancer 
> and SCP for incorrect reeassignment.{color}
> {color:#1d1c1d}We might need to tackle this by removing the entries from the 
> map when hbck actually deletes{color}{color:#1d1c1d} the meta entries for 
> this region which kind of matches the in-memory map’s expectation with the 
> META state.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to