Csaba Ringhofer created IMPALA-14189:
----------------------------------------
Summary: Cleanup subdirectories in truncate/insert overwrite if
recursing listing is enabled
Key: IMPALA-14189
URL: https://issues.apache.org/jira/browse/IMPALA-14189
Project: IMPALA
Issue Type: Improvement
Reporter: Csaba Ringhofer
Currently Impala doesn't delete files in sub directories while Hive does,
though both Hive and Impala do recursive listing by default in external tables
(can be disabled with
impala.disable.recursive.listing).
Example:
{code}
show files in texternal; -- return a single file in a subdirectory (nested_dir)
-> hdfs://localhost:20500/test-warehouse/texternal/nested_dir/a.txt
truncate texternal;
show files in texternal; --returns the same result
-> hdfs://localhost:20500/test-warehouse/texternal/nested_dir/a.txt
insert overwrite texternal select * from texternal;
show files in texternal; -- the file in the subdir is still kept after insert
overwrite
hdfs://localhost:20500/test-warehouse/texternal/f549975b8cf16b86-19a0de0d00000000_1586861351_data.0.txt
hdfs://localhost:20500/test-warehouse/texternal/nested_dir/a.txt
{code}
Hive deletes sub directories both during truncate and insert overwrite
(probably skips hidden folders, didn't check)
I think that the correct solution would be to always delete the files that are
considered part of the table.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]