[ 
https://issues.apache.org/jira/browse/HIVE-28819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28819:
----------------------------------
    Labels: hive-4.2.0-must pull-request-available replication  (was: 
pull-request-available replication)

> [HiveAcidReplication] copy gets stuck for hive.repl.retry.total.duration for 
> FileNotFoundException
> --------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-28819
>                 URL: https://issues.apache.org/jira/browse/HIVE-28819
>             Project: Hive
>          Issue Type: Bug
>          Components: repl
>    Affects Versions: 4.1.0
>            Reporter: Harshal Patel
>            Assignee: Harshal Patel
>            Priority: Major
>              Labels: hive-4.2.0-must, pull-request-available, replication
>
> in _*{{FileUtils.java}}*_  in function 
> {code:java}
> public static boolean copy(FileSystem srcFS, Path[] srcs, FileSystem dstFS, 
> Path dst, boolean deleteSource, boolean overwrite, boolean preserveXAttr, 
> Configuration conf) throws IOException 
> {code}
> there is this block which copies the files
>  
> {code:java}
> try { if (!copy(srcFS, srcFS.getFileStatus(src), dstFS, dst, deleteSource, 
> overwrite, preserveXAttr, conf))
> { returnVal = false; }
> } catch (IOException var15) \{ gotException = true; 
> exceptions.append(var15.getMessage()); exceptions.append("\n"); } } if 
> (gotException) \{ throw new IOException(exceptions.toString()); } else \{ 
> return returnVal; } } }
> {code}
> so if file gets deleted during copy operation then it 
> *_FileNotFoundException_* will be wrapped in the *_IOException_* and that is 
> causing an issue
> because of retryable function which is like this
>  
> {code:java}
> private <T> T retryableFxn(Callable<T> callable) throws IOException { 
> Retryable retryable = Retryable.builder() .withHiveConf(hiveConf) 
> .withRetryOnException(IOException.class).withFailOnParentExceptionList(failOnParentExceptionList).build();
>  try
> { return retryable.executeCallable(() -> callable.call()); }
> catch (Exception e) { if (failOnParentExceptionList.stream().anyMatch(k -> 
> k.isAssignableFrom(e.getClass())))
> { throw new IOException(e); }
> throw new IOException(ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg(), e); 
> } }
> {code}
>  
>  
> {code:java}
> private List<Class<? extends Exception>> failOnParentExceptionList = 
> Arrays.asList(org.apache.hadoop.fs.PathIOException.class, 
> org.apache.hadoop.fs.UnsupportedFileSystemException.class, 
> org.apache.hadoop.fs.InvalidPathException.class, 
> org.apache.hadoop.fs.InvalidRequestException.class, 
> org.apache.hadoop.fs.FileAlreadyExistsException.class, 
> org.apache.hadoop.fs.ChecksumException.class, 
> org.apache.hadoop.fs.ParentNotDirectoryException.class, 
> org.apache.hadoop.hdfs.protocol.QuotaExceededException.class, 
> FileNotFoundException.class);
> {code}
> Here, if you see, if it hits *_FileNotFoundException_* then it will not retry 
> but in above case, that is getting wrapped into IOException and that's why it 
> is getting stuck for 24 hours



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to