[
https://issues.apache.org/jira/browse/HDFS-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203171#comment-13203171
]
Todd Lipcon commented on HDFS-2910:
-----------------------------------
In order to make the NN ride over a hiccup, it seems the solution is to add a
more resilient JournalSet implementation -- ie either one that operates over a
quorum of shared dirs, or one which has a more stubborn retry policy. Given
that NFS itself already has built in retries and can be configured to arbitrary
timeouts, it doesn't seem like we should worry about short hiccups -- any
outage that makes it past the configured NFS retry/timeouts is likely to be
worth causing a failover IMO.
> HA: Active NN reports Bad state: BETWEEN_LOG_SEGMENTS when shared edits dir
> is inaccessible during log roll
> -----------------------------------------------------------------------------------------------------------
>
> Key: HDFS-2910
> URL: https://issues.apache.org/jira/browse/HDFS-2910
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha, name-node
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Bikas Saha
> Assignee: Bikas Saha
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira