[
https://issues.apache.org/jira/browse/HBASE-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673408#comment-13673408
]
Jeffrey Zhong commented on HBASE-8666:
--------------------------------------
{quote}
In what scenarios would previouslyFailedServers not suffice alone? Will
previouslyFailedMetaRSs not be a subset of previouslyFailedServers.
{quote}
When I run tests and killed RS and Master in random order, I end up with
previouslyFailedMetaRSs isn't part of previouslyFailedServers. The end result
is bad because META can't be out of recovering state. So comes the v3 patch
which can make sure .META. will be out of recovering state even data integrity
has broken before master starts up.
{quote}
Something similar in removeRecoveringRegionsFromZK() too?
{quote}
The reason to initialize the removeRecoveringRegionsFromZK to 0 is to let
recovering region GC run once after master is initialized to remove possible
stale recovering regions. The call will be trigged inside
TimeoutMonitor#removeRecoveringRegionsFromZK(null, null);. This change is a
nice to have one.
Thanks.
> META region isn't fully recovered during master initialization when META
> region recovery had chained failures
> -------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-8666
> URL: https://issues.apache.org/jira/browse/HBASE-8666
> Project: HBase
> Issue Type: Bug
> Components: MTTR
> Reporter: Jeffrey Zhong
> Assignee: Jeffrey Zhong
> Fix For: 0.98.0, 0.95.2
>
> Attachments: hbase-8666.patch, hbase-8666-v2.patch,
> hbase-8666-v3.patch
>
>
> In distributedLogReplay mode when Meta recovery had experienced chained
> failures(recovery failed multiple times in a row), META region can't be fully
> recovered during master starts up.
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira