[
https://issues.apache.org/jira/browse/HBASE-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615341#comment-13615341
]
Nicolas Liochon commented on HBASE-7878:
----------------------------------------
bq. I understand that in case false is returned from recoverLease, we would
wait longer.
Yes, we were not waiting before so we were wrong but fast. It seems that the
recovery takes 5s
bq. One remedy I can think of is to bundle lease recovery for several files
together so that the extra wait can be amortized.
It seems to be a good idea. I continue to investigate (there are other issues
as well) but this one seems to be a clear quick win. Because even if the
recovery is immediate, we now wait 1s (the first call returns false, so we wait
1s and then retry).
Why is the leaseRecovery done by the regionserver vs. the master, btw?
Moreover, I would also expect to have just a few WAL file opened on the hdfs
side (the one for .meta., the current one, may be the previous one if we have
just rolled?). We should call the lease recovery of these ones first may be?
> recoverFileLease does not check return value of recoverLease
> ------------------------------------------------------------
>
> Key: HBASE-7878
> URL: https://issues.apache.org/jira/browse/HBASE-7878
> Project: HBase
> Issue Type: Bug
> Components: util
> Affects Versions: 0.95.0, 0.94.6
> Reporter: Eric Newton
> Assignee: Ted Yu
> Priority: Critical
> Fix For: 0.95.0, 0.98.0
>
> Attachments: 7878.94, 7878-94.addendum, 7878-94.addendum2,
> 7878-addendum.txt, 7878-trunk.addendum, 7878-trunk.addendum2,
> 7878-trunk-v10.txt, 7878-trunk-v11-test.txt, 7878-trunk-v12.txt,
> 7878-trunk-v13.txt, 7878-trunk-v14.txt, 7878-trunk-v15.patch,
> 7878-trunk-v16.txt, 7878-trunk-v2.txt, 7878-trunk-v3.txt, 7878-trunk-v4.txt,
> 7878-trunk-v5.txt, 7878-trunk-v6.txt, 7878-trunk-v7.txt, 7878-trunk-v8.txt,
> 7878-trunk-v9.txt, 7878-trunk-v9.txt
>
>
> I think this is a problem, so I'm opening a ticket so an HBase person takes a
> look.
> Apache Accumulo has moved its write-ahead log to HDFS. I modeled the lease
> recovery for Accumulo after HBase's lease recovery. During testing, we
> experienced data loss. I found it is necessary to wait until recoverLease
> returns true to know that the file has been truly closed. In FSHDFSUtils,
> the return result of recoverLease is not checked. In the unit tests created
> to check lease recovery in HBASE-2645, the return result of recoverLease is
> always checked.
> I think FSHDFSUtils should be modified to check the return result, and wait
> until it returns true.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira