[
https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075406#comment-16075406
]
Sergio Peña commented on HIVE-17001:
------------------------------------
[~zsombor.klara] I didn't understand the test case.
{noformat}
# One partition dt='p1' with row ("a",1) is added
insert into test_part partition(dt = 'p1') values ("a", 1);
# Partition metadata is removed only (no data because it is an external table)
alter table test_part drop partition (dt='p1');
# Data is moved
dfs -mv ${system:test.tmp.dir}/test/dt=p1/000000_0
${system:test.tmp.dir}/test/dt=p1/000000_1;
# Partition is re-created with dt='p1" with row ("b",2)
insert overwrite table test_part partition(dt = 'p1') values ("b", 2);
# This is correct, only one row is seen because the row ("a",1) was moved to
another location manually.
# Where is the issue here?
select * from test_part;
{noformat}
> Insert overwrite table doesn't clean partition directory on HDFS if partition
> is missing from HMS
> -------------------------------------------------------------------------------------------------
>
> Key: HIVE-17001
> URL: https://issues.apache.org/jira/browse/HIVE-17001
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2, Metastore
> Reporter: Barna Zsombor Klara
> Assignee: Barna Zsombor Klara
> Attachments: HIVE-17001.01.patch
>
>
> Insert overwrite table should clear existing data before creating the new
> data files.
> For a partitioned table we will clean any folder of existing partitions on
> HDFS, however if the partition folder exists only on HDFS and the partition
> definition is missing in HMS, the folder is not cleared.
> Reproduction steps:
> 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string);
> 2. INSERT INTO test PARTITION(ds='p1') values ('a');
> 3. Copy the data to a different folder with different name.
> 4. ALTER TABLE test DROP PARTITION (ds='p1');
> 5. Recreate the partition directory, copy and rename the data file back
> 6. INSERT INTO test PARTITION(ds='p1') values ('b');
> 7. SELECT * from test;
> will result in 2 records being returned instead of 1.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)