[jira] [Commented] (ACCUMULO-3937) Hard-coded HDFS failure tolerance

Josh Elser (JIRA) Fri, 10 Jul 2015 17:38:14 -0700

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623132#comment-14623132
 ]


Josh Elser commented on ACCUMULO-3937:
--------------------------------------

I also increased this from 5 to 15 because we only sleep for 100ms between 
attempts. We should likely have an increasing backoff during failures to 
trigger this more reliably.

> Hard-coded HDFS failure tolerance
> ---------------------------------
>
>                 Key: ACCUMULO-3937
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3937
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.7.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 1.7.1, 1.8.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> ACCUMULO-2480 added an error cache to the TabletServer which makes the 
> tserver kill itself after 5 errors creating a new WAL file within 10 seconds.
> This is painful because it now causes Accumulo to kill itself if HDFS is 
> restarted beneath Accumulo. Previously, I would have expected Accumulo to 
> just keep on chugging if HDFS goes away. Now, I'll have to restart it when 
> HDFS returns.
> This should be a configuration property instead of being hard-coded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ACCUMULO-3937) Hard-coded HDFS failure tolerance

Reply via email to