[
https://issues.apache.org/jira/browse/HDFS-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Colin Patrick McCabe updated HDFS-6755:
---------------------------------------
Description: DFSOutputStream#close has a loop where it tries to contact the
NameNode, to call {{complete}} on the file which is open-for-write. This loop
includes a sleep which increases exponentially (exponential backoff). It makes
sense to sleep before re-contacting the NameNode, but the code also sleeps even
in the case where it has already decided to give up and throw an exception back
to the user. It should not sleep after it has already decided to give up,
since there's no point. (was: Following code in DFSOutputStream may have an
unnecessary sleep.
{code}
try {
Thread.sleep(localTimeout);
if (retries == 0) {
throw new IOException("Unable to close file because the last block"
+ " does not have enough number of replicas.");
}
retries--;
localTimeout *= 2;
if (Time.now() - localstart > 5000) {
DFSClient.LOG.info("Could not complete " + src + " retrying...");
}
} catch (InterruptedException ie) {
DFSClient.LOG.warn("Caught exception ", ie);
}
{code}
Currently, the code sleeps before throwing an exception which should not be the
case.
The sleep time gets doubled on every iteration, which can make a significant
effect if there are more than one iterations and it would sleep just to throw
an exception. We need to move the sleep down after decrementing retries.)
> There is an unnecessary sleep in the code path where DFSOutputStream#close
> gives up its attempt to contact the namenode
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-6755
> URL: https://issues.apache.org/jira/browse/HDFS-6755
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.6.0
> Reporter: Mit Desai
> Assignee: Mit Desai
> Attachments: HDFS-6755.patch
>
>
> DFSOutputStream#close has a loop where it tries to contact the NameNode, to
> call {{complete}} on the file which is open-for-write. This loop includes a
> sleep which increases exponentially (exponential backoff). It makes sense to
> sleep before re-contacting the NameNode, but the code also sleeps even in the
> case where it has already decided to give up and throw an exception back to
> the user. It should not sleep after it has already decided to give up, since
> there's no point.
--
This message was sent by Atlassian JIRA
(v6.2#6252)