Thanks Tony, that makes sense. On Sun, Nov 29, 2015 at 11:11 AM, Tony Kurc <[email protected]> wrote:
> Mark, > I didn't write this code, but I believe that this code will trigger when > the namenode is out to lunch (e.g. garbage collecting or a big queue of > operations and *maybe* safe mode. Hadoop code in general really doesn't > have the concept of flow control (i.e. RPC feedback that it is too busy, > and you should come back later,). If we're looping here, really, I don't > think other hdfs operations are going to succeed. > > In any case, like the code Ricky contributed to periodically resync > kerberos, workarounds for iffy client APIs should be revisited every so > often. > > On Sun, Nov 29, 2015 at 9:41 AM, Mark Petronic <[email protected]> > wrote: > > > This is the sort of "mystery" code that really should have some explicit > > code comments. :) What are the underlying reasons for this retry logic? > > This could definitely lead to a bottleneck if this loop has to run and > > sleep numerous times. Just wondering what, in HDFS, results in the need > to > > do this? Maybe what to do to in the cluster setup/config to avoid hitting > > this condition, if that is possible? > > > > Thanks > > > > boolean renamed = false; > > for (int i = 0; i < 10; i++) { // try to rename multiple > times. > > if (hdfs.rename(tempCopyFile, copyFile)) { > > renamed = true; > > break;// rename was successful > > } > > Thread.sleep(200L);// try waiting to let whatever might > > cause rename failure to resolve > > } > > if (!renamed) { > > hdfs.delete(tempCopyFile, false); > > throw new ProcessException("Copied file to HDFS but could > > not rename dot file " + tempCopyFile > > + " to its final filename"); > > } > > >
