[
https://issues.apache.org/jira/browse/HDFS-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553607#comment-13553607
]
Steve Loughran commented on HDFS-4389:
--------------------------------------
# the retry policy Nicholas talks of will catch and handle SafeModeExceptions;
tested for Pig, HBase and Hive.
# There's a Groovy Swing UI to see what's going on in the cluster: NN probes +
blocking/non-blocking FS operations up on the "should go into contrib"
HA-monitor code https://github.com/hortonworks/HA-Monitor
# The JT needs to be told to keep an eye on HDFS status (inc safe mode) and not
overreact to failing tasks during an outage (timeouts & failures don't trigger
Job failure or TT blacklisting
# The full list of options is up at
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.0/bk_hdp1-system-admin-guide/content/ch_ha-redhat-deploy.html
Even HDFS 2 and its active/passive failover, telling the layers above to be
resilient to failures, either from extended retry policies (default) or app
specific probes & policies (as the JT does) is good because it gives you
resilience to may of the other outages you can encounter (network problems,
failure of rack containing both NNs, whale falling out of sky (yes, it
happens). There's generally no reason not to flip the switch.
> Non-HA DFSClients do not attempt reconnects
> -------------------------------------------
>
> Key: HDFS-4389
> URL: https://issues.apache.org/jira/browse/HDFS-4389
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha, hdfs-client
> Affects Versions: 2.0.0-alpha, 3.0.0
> Reporter: Daryn Sharp
> Priority: Critical
>
> The HA retry policy implementation appears to have broken non-HA
> {{DFSClient}} connect retries. The ipc
> {{Client.Connection#handleConnectionFailure}} used to perform 45 connection
> attempts, but now it consults a retry policy. For non-HA proxies, the policy
> does not handle {{ConnectException}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira