[
https://issues.apache.org/jira/browse/IMPALA-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18007218#comment-18007218
]
ASF subversion and git services commented on IMPALA-14223:
----------------------------------------------------------
Commit 95a073aa08f88e3aa13345fab02b4b6981f18ca6 in impala's branch
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=95a073aa0 ]
IMPALA-14223: Cleanup subdirectories in INSERT OVERWRITE
If an external table contains data files in subdirectories, and
recursive listing is enabled, Impala considers the files in the
subdirectories as part of the table. However, currently INSERT OVERWRITE
and TRUNCATE do not always delete these files, leading to data
corruption.
This change takes care of INSERT OVERWRITE.
Before this change, for unpartitioned external tables, only top-level
data files were deleted and data files in subdirectories (whether
hidden, ignored or normal) were kept.
After this change, directories are also deleted in addition to
(non-hidden) data files, with the exception of hidden and ignored
directories. (Note: for ignored directories, see
--ignored_dir_prefix_list).
Note that for partitioned tables, INSERT OVERWRITE completely removes
the partition directories that are affected, and this change does not
alter that.
Testing:
- extended the tests in test_recursive_listing.py::TestRecursiveListing
Change-Id: I1a40a22e18e6a384da982d300422ac8995ed0273
Reviewed-on: http://gerrit.cloudera.org:8080/23165
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Daniel Becker <[email protected]>
> Cleanup subdirectories in INSERT OVERWRITE
> ------------------------------------------
>
> Key: IMPALA-14223
> URL: https://issues.apache.org/jira/browse/IMPALA-14223
> Project: IMPALA
> Issue Type: Sub-task
> Reporter: Daniel Becker
> Assignee: Daniel Becker
> Priority: Critical
>
> This issue tracks the problem described in IMPALA-14189 for INSERT OVERWRITE.
> See parent issue for more details.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]