Add simple "anti-entropy" for region assignment
-----------------------------------------------
Key: HBASE-2486
URL: https://issues.apache.org/jira/browse/HBASE-2486
Project: Hadoop HBase
Issue Type: Improvement
Components: master, regionserver
Affects Versions: 0.20.5
Reporter: Todd Lipcon
We've seen a number of bugs where a region server thinks it should not be
serving a region, but the master and META think it should be. I'd like to
propose a very simple way of fixing this issue:
1) whenever a regionserver throws a NotServingRegionException, it also marks
that region id in an RS-wide Set
2) when a region sends a heartbeat, include a message for each of these
regions, MSG_REPORT_NSRE or somesuch, and then clear the set
3) when the master receives MSG_REPORT_NSRE, it does the following checks:
a) if the region is assigned elsewhere according to META, the NSRE was due to a
stale client, ignore
b) if the region is in transition, ignore
c) otherwise, we have an inconsistency, and we should take some steps to
resolve (eg mark the region unassigned, or exit the master if we are in
"paranoid mode")
Whatever we do, we need to make sure that this is loudly logged, and causes
unit tests to fail, when it's detected. This should *not* happen, but when it
does, it would be good to recover without addtable.rb, etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.