[ 
https://issues.apache.org/jira/browse/IMPALA-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312199#comment-17312199
 ] 

ASF subversion and git services commented on IMPALA-10579:
----------------------------------------------------------

Commit 1d839e423e51b05314e3dbfd790cb1fa7fc82d98 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1d839e4 ]

IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

HDFS FileSystem provides a listStatusIterator() API for listing remote
storage using a RemoteIterator. We use it to list files when loading
table file metadata.

It's not guaranteed that a RemoteIterator can survive when its hasNext()
or next() throws IOExceptions. We should stop the loop in this case.
Otherwise, we may go into a infinite loop.

Without HADOOP-16685, it's also not guaranteed that
FileSystem.listStatusIterator() will throw a FileNotFoundException when
the path doesn't exist.

This patch refactors the file listing iterators so we don't need to
depend on these two assumptions. The basic idea is:
 - On one side, we should not depends on other RemoteIterator's behavior
   after exception.
 - On the other side, we try to make our own iterators more robust on
   transient sub-directories. So table loading won't be failed by them.

Tests:
 - Loop test_insert_stress.py 100 times. Verified the non-existing
   subdirs are skipped and inserts are stable in a high concurrency.

Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
Reviewed-on: http://gerrit.cloudera.org:8080/17171
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Deadloop in table metadata loading when using an invalid RemoteIterator
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-10579
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10579
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 3.4.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> The file listing thread in catalogd will go into a dead loop if it gets a 
> RemoteIterator on a non-existing path. The first call of the 
> RemoteIterator.hasNext() will throw a FileNotFoundException. However, this 
> exception will be catched and the loop will continue, which results in a dead 
> loop. Related codes: 
> [https://github.com/apache/impala/blob/d89c04bf806682d3449c566ce979632bd2ac5b29/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L789-L814]
> {code:java}
>   static class FilterIterator implements RemoteIterator<FileStatus> {
>     ...
>     public boolean hasNext() throws IOException {
>       ...
>       while (curFile_ == null) {
>         FileStatus next;
>         try {
>           if (!baseIterator_.hasNext()) return false; // <---- throws 
> FileNotFoundException
>           ...
>           next = baseIterator_.next();
>         } catch (FileNotFoundException ex) {
>           ...
>           LOG.warn(ex.getMessage());
>           continue;  // <--------- catch the exception and continue into a 
> dead loop
>         }
>         if (!isInIgnoredDirectory(startPath_, next)) {
>           curFile_ = next;
>           return true;
>         }
>       }
>       return true;
>     }
> {code}
> *When will the path to be loading not exist?*
>  It happens when metadata (table/partition location) in HMS still have the 
> path. But it's actually removed from the storage.
> *When will impala get such an invalid RemoteIterator?*
>  For FileSystem implementations that don't override the 
> FileSystem#listStatusIterator() interface, e.g. S3AFileSystem before 
> HADOOP-17281, AzureBlobFileSystem, and GoogleHadoopFileSystem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to