[
https://issues.apache.org/jira/browse/AMBARI-11192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615924#comment-14615924
]
Alejandro Fernandez commented on AMBARI-11192:
----------------------------------------------
Need to revert this patch.
https://reviews.apache.org/r/36231/
In the case of an HA cluster where the former primary NN was killed "dirty", by
catastrophic power-down or equivalent, and the cluster has successfully failed
over to the other NN, a client that first attempts to contact the dead NN takes
10 minutes to switch to the other NN.
In Ambari 2.0 and HDP 2.2, dfs.client.retry.policy.enabled was not set at all.
Recently, in Ambari 2.1 for HDP 2.3, it was defaulted to true as part of
AMBARI-11192.
However, this causes problems during RU
In an HA setup, our retry actually should be handled by RetryInvocationHandler
using retry policy FailoverOnNetworkExceptionRetry. The client first translates
the nameservice ID into two host names, and creates an individual RPC proxy for
each NameNode accordingly. Each individual NameNode proxy still uses
MultipleLinearRandomRetry as its local retry policy, but because we usually set
dfs.client.retry.policy.enabled to false, thus this internal retry is actually
disabled. Then in case we hit any connection issue or remote exception
(including StandbyException), the exception is caught by RetryInvocationHandler
and handled according to FailoverOnNetworkExceptionRetry. In this way the
client can failover to the other namenode immediately instead of keeping
retrying the same NameNode.
However, here because we set dfs.client.retry.policy.enabled to true, the
MultipleLinearRandomRetry is triggered inside of the internal NameNode proxy
thus we have to wait 10+ min. The exception is finally thrown to
RetryInvocationHandler until all the retries of MultipleLinearRandomRetry fail.
> The Default hdfs-site.xml Should Have Client Retry Logic Enabled For Rolling
> Upgrade
> ------------------------------------------------------------------------------------
>
> Key: AMBARI-11192
> URL: https://issues.apache.org/jira/browse/AMBARI-11192
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.1.0
> Reporter: Jonathan Hurley
> Assignee: Jonathan Hurley
> Priority: Blocker
> Fix For: 2.1.0
>
>
> Install HDP 2.2.0 (champlain using Ambari 2.1)
> Register and Install new version of HDP Dal
> Click on Perform Upgrade
> Observed Error on UI:
> Reason: The hdfs-site.xml property dfs.client.retry.policy.enabled should be
> set to true.
> Failed on: HDFS
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)