[
https://issues.apache.org/jira/browse/IMPALA-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vihang Karajgaonkar resolved IMPALA-8663.
-----------------------------------------
Resolution: Fixed
Fix Version/s: Impala 3.3.0
> FileMetadataLoader should skip listing files in hidden and tmp directories
> --------------------------------------------------------------------------
>
> Key: IMPALA-8663
> URL: https://issues.apache.org/jira/browse/IMPALA-8663
> Project: IMPALA
> Issue Type: Bug
> Reporter: Vihang Karajgaonkar
> Assignee: Vihang Karajgaonkar
> Priority: Critical
> Labels: catalog-v2, impala-acid
> Fix For: Impala 3.3.0
>
>
> Currently, the file metadata loader recursively lists the table and partition
> directories to get the fileStatuses. For each filestatus we ignore the hidden
> files in {{FileSystemUtil.isValidDataFile}}(). However that is not
> sufficient. For instance, if Hive is inserting data into a table when the
> refresh is called, it is possible the staging directory is present within the
> table directory. This staging directory is a hidden directory of the naming
> {{.hive-staging_*}}. It is possible that this directory has files which are
> not hidden (starting from a . or _). Such files should be considered
> temporary files and should not be considered as valid data files.
>
> Another instance where we see this happen is in transactional tables which
> has a {{.manifest}} which is located in a {{_tmp}} directory within the table
> directory. This file should also be skipped and not considered as a valid
> data file.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)