Zoltán Borók-Nagy created IMPALA-10658:
------------------------------------------

             Summary: LOAD DATA INPATH silently fails between HDFS and Azure 
ABFS
                 Key: IMPALA-10658
                 URL: https://issues.apache.org/jira/browse/IMPALA-10658
             Project: IMPALA
          Issue Type: Bug
            Reporter: Zoltán Borók-Nagy


LOAD DATA INPATH silently fails when Impala tries to move files from HDFS to 
ABFS.

The problem is that in 'relocateFile()' we try to figure out if 'sourceFile' is 
on the destination filesystem:
https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L246
We use the following code to decide this:
https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L581-L591

However, the Azure FileSystem implementation doesn't throw an exception in 
'fs.makeQualified(path);'. I just happily returns a new Path substituting the 
prefix "hdfs://" to "abfs://".

So in relocateFile() Impala thinks the 'sourceFile' and 'destFile' are on the 
same filesystem so it tries to invoke 'destFs.rename()':
https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L266

>From 
>https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29
> : "In terms of its implementation, it is the one with the most ambiguity 
>regarding when to return false versus raising an exception."

Seems like the Azure FileSystem implementation doesn't throw an exception on 
failure, but returns false instead. Unfortunately Impala doesn't check the 
return value of destFs.rename() (see above), so the error remains silent.

To fix this issue we need to do two things:
* fix FileSystemUtil.isPathOnFileSystem()
* check the return value of destFs.rename() and throw an exception when it's 
false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to