[
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564534#comment-14564534
]
Vinayakumar B commented on HDFS-8270:
-------------------------------------
Seems like default retries also got removed.
Client is not retrying for even connect exceptions.
Just following changes will do IMO
in NameNodeProxies#createNNProxyWithClientProtocol(..) inside {{withRetries}}
if block, do the below changes. Let everything else be same.
{code} if (withRetries) { // create the proxy with retries
- RetryPolicy createPolicy = RetryPolicies
- .retryUpToMaximumCountWithFixedSleep(5,
- HdfsServerConstants.LEASE_SOFTLIMIT_PERIOD,
TimeUnit.MILLISECONDS);
-
- Map<Class<? extends Exception>, RetryPolicy> remoteExceptionToPolicyMap
- = new HashMap<Class<? extends Exception>, RetryPolicy>();
- remoteExceptionToPolicyMap.put(AlreadyBeingCreatedException.class,
- createPolicy);
-
- RetryPolicy methodPolicy = RetryPolicies.retryByRemoteException(
- defaultPolicy, remoteExceptionToPolicyMap);
Map<String, RetryPolicy> methodNameToPolicyMap
= new HashMap<String, RetryPolicy>();
-
- methodNameToPolicyMap.put("create", methodPolicy);
ClientProtocol translatorProxy =
new ClientNamenodeProtocolTranslatorPB(proxy);
{code}
> create() always retried with hardcoded timeout when file already exists
> -----------------------------------------------------------------------
>
> Key: HDFS-8270
> URL: https://issues.apache.org/jira/browse/HDFS-8270
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 2.6.0
> Reporter: Andrey Stepachev
> Assignee: J.Andreina
> Attachments: HDFS-8270.1.patch
>
>
> In Hbase we stumbled on unexpected behaviour, which could
> break things.
> HDFS-6478 fixed wrong exception
> translation, but that apparently led to unexpected bahaviour:
> clients trying to create file without override=true will be forced
> to retry hardcoded amount of time (60 seconds).
> That could break or slowdown systems, that use filesystem
> for locks (like hbase fsck did, and we got it broken HBASE-13574).
> We should make this behaviour configurable, do client really need
> to wait lease timeout to be sure that file doesn't exists, or it it should
> be enough to fail fast.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)