[jira] [Commented] (HBASE-13602) Add an option to fail wal recovery when lease recovery fails

Sean Busbey (JIRA) Thu, 30 Apr 2015 14:05:25 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522270#comment-14522270
 ]


Sean Busbey commented on HBASE-13602:
-------------------------------------

no timeout also has the same problem where folks who had slow-to-recover 
problems suddenly have hanging-forever problems.

for example, the cluster I saw this on definitely wouldn't have data loss 
because I manually ssh'd to each node and verified there were no old RS 
processes. my FileSystem instance was failing all lease recovery, so without 
the timeout it would never have recovered.

> Add an option to fail wal recovery when lease recovery fails
> ------------------------------------------------------------
>
>                 Key: HBASE-13602
>                 URL: https://issues.apache.org/jira/browse/HBASE-13602
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Sean Busbey
>              Labels: operability
>             Fix For: 2.0.0, 1.2.0
>
>
> Currently, if lease recovery doesn't succeed over an extended timeout 
> (default 15 minutes), then we issue a log message about possible data loss 
> and continue with recovering the edits in that file.
> In some deployments this potential for dataloss might be unacceptable. In 
> those situations it would be good to have a configurable setting that marks 
> the recovery failed instead. Should default to off (at least in branch-1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13602) Add an option to fail wal recovery when lease recovery fails

Reply via email to