Gabor Kaszab created HIVE-19830:
-----------------------------------

             Summary: Inconsistent behavior when multiple partitions point to 
the same location
                 Key: HIVE-19830
                 URL: https://issues.apache.org/jira/browse/HIVE-19830
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 2.4.0
            Reporter: Gabor Kaszab
            Assignee: Adam Szita


// create a table with 2 partitions where both partitions share the same 
location and inserting a single line to one of them.
create table test (i int) partitioned by (j int) stored as parquet;
alter table test add partition (j=1) location 
'hdfs://localhost:20500/test-warehouse/test/j=1';
alter table test add partition (j=2) location 
'hdfs://localhost:20500/test-warehouse/test/j=1';
insert into table test partition (j=1) values (1);

// select * show this single line in both partitions as expected.
select * from test;
1 1
1 2

// however, sum() doesn't add up the line for all the partitions. This is 
+Issue #1+.
select sum( i), sum(j) from test;
1 2

// On the file system there is a common dir for the 2 partitions that is 
expected.
hdfs dfs -ls hdfs://localhost:20500/test-warehouse/test/
Found 1 items
drwxr-xr-x - gaborkaszab supergroup 0 2018-06-08 10:54 
hdfs://localhost:20500/test-warehouse/test/j=1

// Let's drop one of the partitions now!
alter table test drop partition (j=2);
// running the same hdfs dfs -ls command shows that the j=1 directory is 
dropped. I think this is a good behavior, we just have to document that this is 
the expected case.
// select * from test; returns zero rows, this is still as expected.

// Even though the dir is dropped j=1 partition is still visible with show 
partitions. This is +Issue #2+.
show partitions test;
j=1

After dropping the directory with Hive, when Impala reloads it's partitions it 
asks Hive to tell what are the existing partitions. Apparently, Hive sends down 
a list with j=1 partition included and then Impala takes it as an existing one 
and doesn't drop it from Catalog's cache. Here Hive shouldn't send that 
partition down. This is +Issue #3+.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to