[ https://issues.apache.org/jira/browse/HBASE-21421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allan Yang updated HBASE-21421: ------------------------------- Resolution: Fixed Fix Version/s: 2.1.2 2.0.3 3.0.0 Status: Resolved (was: Patch Available) Pushed to branch-2.0+, thanks for reviewing,[~Apache9]. > Do not kill RS if reportOnlineRegions fails > ------------------------------------------- > > Key: HBASE-21421 > URL: https://issues.apache.org/jira/browse/HBASE-21421 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.1.1, 2.0.2 > Reporter: Allan Yang > Assignee: Allan Yang > Priority: Major > Fix For: 3.0.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21421.branch-2.0.001.patch, > HBASE-21421.branch-2.0.002.patch, HBASE-21421.branch-2.0.003.patch, > HBASE-21421.branch-2.0.004.patch > > > In the periodic regionServerReport from RS to master, we will call > master.getAssignmentManager().reportOnlineRegions() to make sure the RS has a > same state with Master. If RS holds a region which master think should be on > another RS, the Master will kill the RS. > But, the regionServerReport could be lagging(due to network or something), > which can't represent the current state of RegionServer. Besides, we will > call reportRegionStateTransition and try forever until it successfully > reported to master when online a region. We can count on > reportRegionStateTransition calls. > I have encountered cases that the regions are closed on the RS and > reportRegionStateTransition to master successfully. But later, a lagging > regionServerReport tells the master the region is online on the RS(Which is > not at the moment, this call may generated some time ago and delayed by > network somehow), the the master think the region should be on another RS, > and kill the RS, which should not be. -- This message was sent by Atlassian JIRA (v7.6.3#76005)