Vihang Karajgaonkar created IMPALA-8663:
-------------------------------------------
Summary: FileMetadataLoader should skip listing files in hidden
and tmp directories
Key: IMPALA-8663
URL: https://issues.apache.org/jira/browse/IMPALA-8663
Project: IMPALA
Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar
Currently, the file metadata loader recursively lists the table and partition
directories to get the fileStatuses. For each filestatus we ignore the hidden
files in {{FileSystemUtil.isValidDataFile}}(). However that is not sufficient.
For instance, if Hive is inserting data into a table when the refresh is
called, it is possible the staging directory is present within the table
directory. This staging directory is a hidden directory of the naming
{{.hive-staging_*}}. It is possible that this directory has files which are not
hidden (starting from a . or _). Such files should be considered temporary
files and should not be considered as valid data files.
Another instance where we see this happen is in transactional tables which has
a {{.manifest}} which is located in a {{_tmp}} directory within the table
directory. This file should also be skipped and not considered as a valid data
file.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)