[ 
https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075421#comment-16075421
 ] 

Naveen Gangam commented on HIVE-17001:
--------------------------------------

[~zsombor.klara] Quick qs on the issue. I am a bit confused between the jira 
summary and the reproducer. Summary says "insert overwrite" but the reproducer 
does not use "insert overwrite". So I am wondering if the reproducer is 
intended to be the same as written.

I am not sure if this is a bug. Say, you execute the following
INSERT INTO test PARTITION(ds='p1') values ('a');
INSERT INTO test PARTITION(ds='p1') values ('a');

The resultant partition directory should contain 2 data files and a select * on 
the table should return 2 rows. This is by design.
The testcase in this jira is semantically similar to the case above, where you 
have some existing data in a partition and you are inserting additional data. 
Would you agree?

Normally, step 4 of the reproducer should have deleted the data for the 
partition, had it existed. But I think it is legal to manage some or all of the 
partition data externally, as well. 

Am I making sense? Thanks

> Insert overwrite table doesn't clean partition directory on HDFS if partition 
> is missing from HMS
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-17001
>                 URL: https://issues.apache.org/jira/browse/HIVE-17001
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2, Metastore
>            Reporter: Barna Zsombor Klara
>            Assignee: Barna Zsombor Klara
>         Attachments: HIVE-17001.01.patch
>
>
> Insert overwrite table should clear existing data before creating the new 
> data files.
> For a partitioned table we will clean any folder of existing partitions on 
> HDFS, however if the partition folder exists only on HDFS and the partition 
> definition is missing in HMS, the folder is not cleared.
> Reproduction steps:
> 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string);
> 2. INSERT INTO test PARTITION(ds='p1') values ('a');
> 3. Copy the data to a different folder with different name.
> 4. ALTER TABLE test DROP PARTITION (ds='p1');
> 5. Recreate the partition directory, copy and rename the data file back
> 6. INSERT INTO test PARTITION(ds='p1') values ('b');
> 7. SELECT * from test;
> will result in 2 records being returned instead of 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to