[
https://issues.apache.org/jira/browse/IMPALA-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy reassigned IMPALA-10658:
------------------------------------------
Assignee: Zoltán Borók-Nagy
> LOAD DATA INPATH silently fails between HDFS and Azure ABFS
> -----------------------------------------------------------
>
> Key: IMPALA-10658
> URL: https://issues.apache.org/jira/browse/IMPALA-10658
> Project: IMPALA
> Issue Type: Bug
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
>
> LOAD DATA INPATH silently fails when Impala tries to move files from HDFS to
> ABFS.
> The problem is that in 'relocateFile()' we try to figure out if 'sourceFile'
> is on the destination filesystem:
> https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L246
> We use the following code to decide this:
> https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L581-L591
> However, the Azure FileSystem implementation doesn't throw an exception in
> 'fs.makeQualified(path);'. I just happily returns a new Path substituting the
> prefix "hdfs://" to "abfs://".
> So in relocateFile() Impala thinks the 'sourceFile' and 'destFile' are on the
> same filesystem so it tries to invoke 'destFs.rename()':
> https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L266
> From
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29
> : "In terms of its implementation, it is the one with the most ambiguity
> regarding when to return false versus raising an exception."
> Seems like the Azure FileSystem implementation doesn't throw an exception on
> failure, but returns false instead. Unfortunately Impala doesn't check the
> return value of destFs.rename() (see above), so the error remains silent.
> To fix this issue we need to do two things:
> * fix FileSystemUtil.isPathOnFileSystem()
> * check the return value of destFs.rename() and throw an exception when it's
> false
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]