[
https://issues.apache.org/jira/browse/HIVE-28819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denys Kuzmenko updated HIVE-28819:
----------------------------------
Labels: hive-4.2.0-must pull-request-available replication (was:
pull-request-available replication)
> [HiveAcidReplication] copy gets stuck for hive.repl.retry.total.duration for
> FileNotFoundException
> --------------------------------------------------------------------------------------------------
>
> Key: HIVE-28819
> URL: https://issues.apache.org/jira/browse/HIVE-28819
> Project: Hive
> Issue Type: Bug
> Components: repl
> Affects Versions: 4.1.0
> Reporter: Harshal Patel
> Assignee: Harshal Patel
> Priority: Major
> Labels: hive-4.2.0-must, pull-request-available, replication
>
> in _*{{FileUtils.java}}*_ in function
> {code:java}
> public static boolean copy(FileSystem srcFS, Path[] srcs, FileSystem dstFS,
> Path dst, boolean deleteSource, boolean overwrite, boolean preserveXAttr,
> Configuration conf) throws IOException
> {code}
> there is this block which copies the files
>
> {code:java}
> try { if (!copy(srcFS, srcFS.getFileStatus(src), dstFS, dst, deleteSource,
> overwrite, preserveXAttr, conf))
> { returnVal = false; }
> } catch (IOException var15) \{ gotException = true;
> exceptions.append(var15.getMessage()); exceptions.append("\n"); } } if
> (gotException) \{ throw new IOException(exceptions.toString()); } else \{
> return returnVal; } } }
> {code}
> so if file gets deleted during copy operation then it
> *_FileNotFoundException_* will be wrapped in the *_IOException_* and that is
> causing an issue
> because of retryable function which is like this
>
> {code:java}
> private <T> T retryableFxn(Callable<T> callable) throws IOException {
> Retryable retryable = Retryable.builder() .withHiveConf(hiveConf)
> .withRetryOnException(IOException.class).withFailOnParentExceptionList(failOnParentExceptionList).build();
> try
> { return retryable.executeCallable(() -> callable.call()); }
> catch (Exception e) { if (failOnParentExceptionList.stream().anyMatch(k ->
> k.isAssignableFrom(e.getClass())))
> { throw new IOException(e); }
> throw new IOException(ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg(), e);
> } }
> {code}
>
>
> {code:java}
> private List<Class<? extends Exception>> failOnParentExceptionList =
> Arrays.asList(org.apache.hadoop.fs.PathIOException.class,
> org.apache.hadoop.fs.UnsupportedFileSystemException.class,
> org.apache.hadoop.fs.InvalidPathException.class,
> org.apache.hadoop.fs.InvalidRequestException.class,
> org.apache.hadoop.fs.FileAlreadyExistsException.class,
> org.apache.hadoop.fs.ChecksumException.class,
> org.apache.hadoop.fs.ParentNotDirectoryException.class,
> org.apache.hadoop.hdfs.protocol.QuotaExceededException.class,
> FileNotFoundException.class);
> {code}
> Here, if you see, if it hits *_FileNotFoundException_* then it will not retry
> but in above case, that is getting wrapped into IOException and that's why it
> is getting stuck for 24 hours
--
This message was sent by Atlassian Jira
(v8.20.10#820010)