[ 
https://issues.apache.org/jira/browse/IMPALA-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-14189:
-------------------------------------
    Component/s: Catalog

> Cleanup subdirectories in truncate/insert overwrite if recursing listing is 
> enabled
> -----------------------------------------------------------------------------------
>
>                 Key: IMPALA-14189
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14189
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Csaba Ringhofer
>            Priority: Critical
>
> Currently Impala doesn't delete files in sub directories while Hive does, 
> though both Hive and Impala do recursive listing by default in external 
> tables (can be disabled with 
> impala.disable.recursive.listing).
> Example:
> {code}
> show files in texternal; -- return a single file in a subdirectory 
> (nested_dir)
> -> hdfs://localhost:20500/test-warehouse/texternal/nested_dir/a.txt 
> truncate texternal;
> show files in texternal; --returns the same result
> -> hdfs://localhost:20500/test-warehouse/texternal/nested_dir/a.txt 
> insert overwrite texternal select * from texternal;
> show files in texternal; -- the file in the subdir is still kept after insert 
> overwrite
>  
> hdfs://localhost:20500/test-warehouse/texternal/f549975b8cf16b86-19a0de0d00000000_1586861351_data.0.txt
>  
> hdfs://localhost:20500/test-warehouse/texternal/nested_dir/a.txt
> {code}
> Hive deletes sub directories both during truncate and insert overwrite 
> (probably skips hidden folders, didn't check)
> I think that the correct solution would be to always delete the files that 
> are considered part of the table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to