Harshal Patel created HIVE-28819:
------------------------------------

             Summary: [HiveAcidReplication] copy gets stuck for 
hive.repl.retry.total.duration for FileNotFoundException
                 Key: HIVE-28819
                 URL: https://issues.apache.org/jira/browse/HIVE-28819
             Project: Hive
          Issue Type: Bug
          Components: repl
    Affects Versions: 4.1.0
            Reporter: Harshal Patel
            Assignee: Harshal Patel


in _*{{FileUtils.java}}*_  in function 


{code:java}
public static boolean copy(FileSystem srcFS, Path[] srcs, FileSystem dstFS, 
Path dst, boolean deleteSource, boolean overwrite, boolean preserveXAttr, 
Configuration conf) throws IOException 
{code}

there is this block which copies the files
 

{code:java}
try { if (!copy(srcFS, srcFS.getFileStatus(src), dstFS, dst, deleteSource, 
overwrite, preserveXAttr, conf))
{ returnVal = false; }

} catch (IOException var15) \{ gotException = true; 
exceptions.append(var15.getMessage()); exceptions.append("\n"); } } if 
(gotException) \{ throw new IOException(exceptions.toString()); } else \{ 
return returnVal; } } }

{code}

so if file gets deleted during copy operation then it *_FileNotFoundException_* 
will be wrapped in the *_IOException_* and that is causing an issue
because of retryable function which is like this
 

{code:java}
private <T> T retryableFxn(Callable<T> callable) throws IOException { Retryable 
retryable = Retryable.builder() .withHiveConf(hiveConf) 
.withRetryOnException(IOException.class).withFailOnParentExceptionList(failOnParentExceptionList).build();
 try

{ return retryable.executeCallable(() -> callable.call()); }

catch (Exception e) { if (failOnParentExceptionList.stream().anyMatch(k -> 
k.isAssignableFrom(e.getClass())))

{ throw new IOException(e); }

throw new IOException(ErrorMsg.REPL_FILE_SYSTEM_OPERATION_RETRY.getMsg(), e); } 
}

{code}
 
 

{code:java}
private List<Class<? extends Exception>> failOnParentExceptionList = 
Arrays.asList(org.apache.hadoop.fs.PathIOException.class, 
org.apache.hadoop.fs.UnsupportedFileSystemException.class, 
org.apache.hadoop.fs.InvalidPathException.class, 
org.apache.hadoop.fs.InvalidRequestException.class, 
org.apache.hadoop.fs.FileAlreadyExistsException.class, 
org.apache.hadoop.fs.ChecksumException.class, 
org.apache.hadoop.fs.ParentNotDirectoryException.class, 
org.apache.hadoop.hdfs.protocol.QuotaExceededException.class, 
FileNotFoundException.class);
{code}

Here, if you see, if it hits *_FileNotFoundException_* then it will not retry 
but in above case, that is getting wrapped into IOException and that's why it 
is getting stuck for 24 hours



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to