Mark,
I didn't write this code, but I believe that this code will trigger when
the namenode is out to lunch (e.g. garbage collecting or a big queue of
operations and *maybe* safe mode. Hadoop code in general really doesn't
have the concept of flow control (i.e. RPC feedback that it is too busy,
and you should come back later,). If we're looping here, really, I don't
think other hdfs operations are going to succeed.

In any case, like the code Ricky contributed to periodically resync
kerberos, workarounds for iffy client APIs should be revisited every so
often.

On Sun, Nov 29, 2015 at 9:41 AM, Mark Petronic <[email protected]>
wrote:

> This is the sort of "mystery" code that really should have some explicit
> code comments. :) What are the underlying reasons for this retry logic?
> This could definitely lead to a bottleneck if this loop has to run and
> sleep numerous times. Just wondering what, in HDFS, results in the need to
> do this? Maybe what to do to in the cluster setup/config to avoid hitting
> this condition, if that is possible?
>
> Thanks
>
>             boolean renamed = false;
>             for (int i = 0; i < 10; i++) { // try to rename multiple times.
>                 if (hdfs.rename(tempCopyFile, copyFile)) {
>                     renamed = true;
>                     break;// rename was successful
>                 }
>                 Thread.sleep(200L);// try waiting to let whatever might
> cause rename failure to resolve
>             }
>             if (!renamed) {
>                 hdfs.delete(tempCopyFile, false);
>                 throw new ProcessException("Copied file to HDFS but could
> not rename dot file " + tempCopyFile
>                         + " to its final filename");
>             }
>

Reply via email to