[
https://issues.apache.org/jira/browse/HBASE-13602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522270#comment-14522270
]
Sean Busbey commented on HBASE-13602:
-------------------------------------
no timeout also has the same problem where folks who had slow-to-recover
problems suddenly have hanging-forever problems.
for example, the cluster I saw this on definitely wouldn't have data loss
because I manually ssh'd to each node and verified there were no old RS
processes. my FileSystem instance was failing all lease recovery, so without
the timeout it would never have recovered.
> Add an option to fail wal recovery when lease recovery fails
> ------------------------------------------------------------
>
> Key: HBASE-13602
> URL: https://issues.apache.org/jira/browse/HBASE-13602
> Project: HBase
> Issue Type: Improvement
> Components: wal
> Reporter: Sean Busbey
> Labels: operability
> Fix For: 2.0.0, 1.2.0
>
>
> Currently, if lease recovery doesn't succeed over an extended timeout
> (default 15 minutes), then we issue a log message about possible data loss
> and continue with recovering the edits in that file.
> In some deployments this potential for dataloss might be unacceptable. In
> those situations it would be good to have a configurable setting that marks
> the recovery failed instead. Should default to off (at least in branch-1)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)