Csaba Ringhofer created IMPALA-14189:
----------------------------------------

             Summary: Cleanup subdirectories in truncate/insert overwrite if 
recursing listing is enabled
                 Key: IMPALA-14189
                 URL: https://issues.apache.org/jira/browse/IMPALA-14189
             Project: IMPALA
          Issue Type: Improvement
            Reporter: Csaba Ringhofer


Currently Impala doesn't delete files in sub directories while Hive does, 
though both Hive and Impala do recursive listing by default in external tables 
(can be disabled with 
impala.disable.recursive.listing).
Example:
{code}
show files in texternal; -- return a single file in a subdirectory (nested_dir)
-> hdfs://localhost:20500/test-warehouse/texternal/nested_dir/a.txt 
truncate texternal;
show files in texternal; --returns the same result
-> hdfs://localhost:20500/test-warehouse/texternal/nested_dir/a.txt 
insert overwrite texternal select * from texternal;
show files in texternal; -- the file in the subdir is still kept after insert 
overwrite
 
hdfs://localhost:20500/test-warehouse/texternal/f549975b8cf16b86-19a0de0d00000000_1586861351_data.0.txt
 
hdfs://localhost:20500/test-warehouse/texternal/nested_dir/a.txt
{code}

Hive deletes sub directories both during truncate and insert overwrite 
(probably skips hidden folders, didn't check)

I think that the correct solution would be to always delete the files that are 
considered part of the table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to