Vihang Karajgaonkar created IMPALA-8663:
-------------------------------------------

             Summary: FileMetadataLoader should skip listing files in hidden 
and tmp directories
                 Key: IMPALA-8663
                 URL: https://issues.apache.org/jira/browse/IMPALA-8663
             Project: IMPALA
          Issue Type: Bug
            Reporter: Vihang Karajgaonkar
            Assignee: Vihang Karajgaonkar


Currently, the file metadata loader recursively lists the table and partition 
directories to get the fileStatuses. For each filestatus we ignore the hidden 
files in {{FileSystemUtil.isValidDataFile}}(). However that is not sufficient. 
For instance, if Hive is inserting data into a table when the refresh is 
called, it is possible the staging directory is present within the table 
directory. This staging directory is a hidden directory of the naming 
{{.hive-staging_*}}. It is possible that this directory has files which are not 
hidden (starting from a . or _). Such files should be considered temporary 
files and should not be considered as valid data files.

 

Another instance where we see this happen is in transactional tables which has 
a {{.manifest}} which is located in a {{_tmp}} directory within the table 
directory. This file should also be skipped and not considered as a valid data 
file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to