Harshal Patel created HIVE-28819:
------------------------------------
Summary: [HiveAcidReplication] copy gets stuck for
hive.repl.retry.total.duration for FileNotFoundException
Key: HIVE-28819
URL: https://issues.apache.org/jira/browse/HIVE-28819
Project: Hive
Issue Type: Bug
Components: repl
Affects Versions: 4.1.0
Reporter: Harshal Patel
Assignee: Harshal Patel
in _*{{FileUtils.java}}*_ in function
{code:java}
public static boolean copy(FileSystem srcFS, Path[] srcs, FileSystem dstFS,
Path dst, boolean deleteSource, boolean overwrite, boolean preserveXAttr,
Configuration conf) throws IOException
{code}
there is this block which copies the files
{code:java}
try { if (!copy(srcFS, srcFS.getFileStatus(src), dstFS, dst, deleteSource,
overwrite, preserveXAttr, conf))
{ returnVal = false; }
} catch (IOException var15) \{ gotException = true;
exceptions.append(var15.getMessage()); exceptions.append("\n"); } } if
(gotException) \{ throw new IOException(exceptions.toString()); } else \{
return returnVal; } } }
{code}
so if file gets deleted during copy operation then it *_FileNotFoundException_*
will be wrapped in the *_IOException_* and that is causing an issue
because of retryable function which is like this
{code:java}
private <T> T retryableFxn(Callable<T> callable) throws IOException { Retryable
retryable = Retryable.builder() .withHiveConf(hiveConf)
.withRetryOnException(IOException.class).withFailOnParentExceptionList(failOnParentExceptionList).build();
try
{ return retryable.executeCallable(() -> callable.call()); }
catch (Exception e) { if (failOnParentExceptionList.stream().anyMatch(k ->
k.isAssignableFrom(e.getClass())))
{ throw new IOException(e); }
throw new IOException(ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg(), e); }
}
{code}
{code:java}
private List<Class<? extends Exception>> failOnParentExceptionList =
Arrays.asList(org.apache.hadoop.fs.PathIOException.class,
org.apache.hadoop.fs.UnsupportedFileSystemException.class,
org.apache.hadoop.fs.InvalidPathException.class,
org.apache.hadoop.fs.InvalidRequestException.class,
org.apache.hadoop.fs.FileAlreadyExistsException.class,
org.apache.hadoop.fs.ChecksumException.class,
org.apache.hadoop.fs.ParentNotDirectoryException.class,
org.apache.hadoop.hdfs.protocol.QuotaExceededException.class,
FileNotFoundException.class);
{code}
Here, if you see, if it hits *_FileNotFoundException_* then it will not retry
but in above case, that is getting wrapped into IOException and that's why it
is getting stuck for 24 hours
--
This message was sent by Atlassian Jira
(v8.20.10#820010)