[ https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564534#comment-14564534 ]
Vinayakumar B commented on HDFS-8270: ------------------------------------- Seems like default retries also got removed. Client is not retrying for even connect exceptions. Just following changes will do IMO in NameNodeProxies#createNNProxyWithClientProtocol(..) inside {{withRetries}} if block, do the below changes. Let everything else be same. {code} if (withRetries) { // create the proxy with retries - RetryPolicy createPolicy = RetryPolicies - .retryUpToMaximumCountWithFixedSleep(5, - HdfsServerConstants.LEASE_SOFTLIMIT_PERIOD, TimeUnit.MILLISECONDS); - - Map<Class<? extends Exception>, RetryPolicy> remoteExceptionToPolicyMap - = new HashMap<Class<? extends Exception>, RetryPolicy>(); - remoteExceptionToPolicyMap.put(AlreadyBeingCreatedException.class, - createPolicy); - - RetryPolicy methodPolicy = RetryPolicies.retryByRemoteException( - defaultPolicy, remoteExceptionToPolicyMap); Map<String, RetryPolicy> methodNameToPolicyMap = new HashMap<String, RetryPolicy>(); - - methodNameToPolicyMap.put("create", methodPolicy); ClientProtocol translatorProxy = new ClientNamenodeProtocolTranslatorPB(proxy); {code} > create() always retried with hardcoded timeout when file already exists > ----------------------------------------------------------------------- > > Key: HDFS-8270 > URL: https://issues.apache.org/jira/browse/HDFS-8270 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Affects Versions: 2.6.0 > Reporter: Andrey Stepachev > Assignee: J.Andreina > Attachments: HDFS-8270.1.patch > > > In Hbase we stumbled on unexpected behaviour, which could > break things. > HDFS-6478 fixed wrong exception > translation, but that apparently led to unexpected bahaviour: > clients trying to create file without override=true will be forced > to retry hardcoded amount of time (60 seconds). > That could break or slowdown systems, that use filesystem > for locks (like hbase fsck did, and we got it broken HBASE-13574). > We should make this behaviour configurable, do client really need > to wait lease timeout to be sure that file doesn't exists, or it it should > be enough to fail fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332)