Karen Coppage created HIVE-24021:
------------------------------------
Summary: Read insert-only tables truncated by Impala correctly
Key: HIVE-24021
URL: https://issues.apache.org/jira/browse/HIVE-24021
Project: Hive
Issue Type: Bug
Reporter: Karen Coppage
Assignee: Karen Coppage
Impala truncates insert-only tables by writing a base directory containing an
empty file named "_empty". (Like Hive should, see HIVE-20137) Generally in Hive
a file name beginning with an underscore connotes a temporary file that isn't
supposed to be read by operations that didn't create it.
Before HIVE-23495, getAcidState listed each directory in the table
(HdfsUtils#listLocatedStatus) – and filtered out directories with names
beginning with an underscore or period as they are presumably temporary. This
allowed files called "_empty" to be read, since hive checked the directory name
and not the file name.
After HIVE-23495, we recursively list each file in the table
(AcidUtils#getHdfsDirSnapshots) with a filter that doesn't accept files with
names beginning with an underscore or period as they are presumably temporary.
As a result Hive reads the table data as if the truncate operation had not
happened.
Since performance in getAcidState is important, probably the best solution is
make an exception in the filter and accept files with the name "_empty".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)