[
https://issues.apache.org/jira/browse/HBASE-10000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843572#comment-13843572
]
Ted Yu commented on HBASE-10000:
--------------------------------
Here is how patch v6 addresses the above scenario:
{code}
if (leaseRecoveryReqTS == HConstants.LEASE_RECOVERY_UNREQUESTED ||
nbAttempt > 0) {
startWaiting = EnvironmentEdgeManager.currentTimeMillis();
if (recoverLease(dfs, nbAttempt, p, startWaiting)) return true;
}
...
if (nbAttempt == 0 && leaseRecoveryReqTS !=
HConstants.LEASE_RECOVERY_UNREQUESTED) {
firstPause -= (EnvironmentEdgeManager.currentTimeMillis() -
leaseRecoveryReqTS);
}
if (nbAttempt == 0 && isFileClosedMeth == null) {
if (firstPause > 0) Thread.sleep(firstPause);
else continue;
} else {
{code}
If the master initiated the recovery more than 4 seconds ago AND there is not
isFileClosed on the region server, firstPause would be negative. In that case
the code continues with iteration #2 and starts lease recovery - keeping the
previous behavior.
I am trying to come up with a test for this scenario where I plan to lift
startWaiting as an instance variable so that the test can query and verify that
we don't wait 1 extra minute.
Does this sound good ?
> Initiate lease recovery for outstanding WAL files at the very beginning of
> recovery
> -----------------------------------------------------------------------------------
>
> Key: HBASE-10000
> URL: https://issues.apache.org/jira/browse/HBASE-10000
> Project: HBase
> Issue Type: Improvement
> Reporter: Ted Yu
> Assignee: Ted Yu
> Fix For: 0.98.1
>
> Attachments: 10000-0.96-v5.txt, 10000-0.96-v6.txt,
> 10000-recover-ts-with-pb-2.txt, 10000-recover-ts-with-pb-3.txt,
> 10000-recover-ts-with-pb-4.txt, 10000-recover-ts-with-pb-5.txt,
> 10000-recover-ts-with-pb-6.txt, 10000-v4.txt, 10000-v5.txt, 10000-v6.txt
>
>
> At the beginning of recovery, master can send lease recovery requests
> concurrently for outstanding WAL files using a thread pool.
> Each split worker would first check whether the WAL file it processes is
> closed.
> Thanks to Nicolas Liochon and Jeffery discussion with whom gave rise to this
> idea.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)