[ 
https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564534#comment-14564534
 ] 

Vinayakumar B commented on HDFS-8270:
-------------------------------------

Seems like default retries also got removed. 
Client is not retrying for even connect exceptions.

Just following changes will do IMO

in NameNodeProxies#createNNProxyWithClientProtocol(..) inside {{withRetries}} 
if block, do the below changes. Let everything else be same.
{code}     if (withRetries) { // create the proxy with retries
 
-      RetryPolicy createPolicy = RetryPolicies
-          .retryUpToMaximumCountWithFixedSleep(5,
-              HdfsServerConstants.LEASE_SOFTLIMIT_PERIOD, 
TimeUnit.MILLISECONDS);
-    
-      Map<Class<? extends Exception>, RetryPolicy> remoteExceptionToPolicyMap 
-                 = new HashMap<Class<? extends Exception>, RetryPolicy>();
-      remoteExceptionToPolicyMap.put(AlreadyBeingCreatedException.class,
-          createPolicy);
-
-      RetryPolicy methodPolicy = RetryPolicies.retryByRemoteException(
-          defaultPolicy, remoteExceptionToPolicyMap);
       Map<String, RetryPolicy> methodNameToPolicyMap 
                  = new HashMap<String, RetryPolicy>();
-    
-      methodNameToPolicyMap.put("create", methodPolicy);
 
       ClientProtocol translatorProxy =
         new ClientNamenodeProtocolTranslatorPB(proxy);
{code}

> create() always retried with hardcoded timeout when file already exists
> -----------------------------------------------------------------------
>
>                 Key: HDFS-8270
>                 URL: https://issues.apache.org/jira/browse/HDFS-8270
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.6.0
>            Reporter: Andrey Stepachev
>            Assignee: J.Andreina
>         Attachments: HDFS-8270.1.patch
>
>
> In Hbase we stumbled on unexpected behaviour, which could 
> break things. 
> HDFS-6478 fixed wrong exception
> translation, but that apparently led to unexpected bahaviour:
> clients trying to create file without override=true will be forced
> to retry hardcoded amount of time (60 seconds).
> That could break or slowdown systems, that use filesystem
> for locks (like hbase fsck did, and we got it broken HBASE-13574).
> We should make this behaviour configurable, do client really need
> to wait lease timeout to be sure that file doesn't exists, or it it should
> be enough to fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to