[ 
https://issues.apache.org/jira/browse/HIVE-16017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16017:
------------------------------------
    Description: 
Update: happens on many more queries it looks like, and started happening after 
a recent master merge after I wasn't working on the feature for a while

This duplicates the data (given that the original query is a self-union, 
essentially outputs it 4 times instead of 2) for either MM or non-MM tables, on 
MM branch.

It seems to be adding correct inputs (esp. in non-MM case the inputs are the 
same as before). Presumably something in the output changes in the branch is 
broken for this case. Not sure what yet. 

{noformat}
CREATE TABLE tbl1_mm(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS;
insert overwrite table tbl1_mm select * from src where key < 10;

select key, value from tbl1_mm a where key < 6
union all
select key, value from tbl1_mm a where key < 6;
{noformat}

  was:
This duplicates the data (given that the original query is a self-union, 
essentially outputs it 4 times instead of 2) for either MM or non-MM tables, on 
MM branch.

It seems to be adding correct inputs (esp. in non-MM case the inputs are the 
same as before). Presumably something in the output changes in the branch is 
broken for this case. Not sure what yet. 

{noformat}
CREATE TABLE tbl1_mm(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS;
insert overwrite table tbl1_mm select * from src where key < 10;

select key, value from tbl1_mm a where key < 6
union all
select key, value from tbl1_mm a where key < 6;
{noformat}


> MM tables - many queries duplicate the data after master merge
> --------------------------------------------------------------
>
>                 Key: HIVE-16017
>                 URL: https://issues.apache.org/jira/browse/HIVE-16017
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>
> Update: happens on many more queries it looks like, and started happening 
> after a recent master merge after I wasn't working on the feature for a while
> This duplicates the data (given that the original query is a self-union, 
> essentially outputs it 4 times instead of 2) for either MM or non-MM tables, 
> on MM branch.
> It seems to be adding correct inputs (esp. in non-MM case the inputs are the 
> same as before). Presumably something in the output changes in the branch is 
> broken for this case. Not sure what yet. 
> {noformat}
> CREATE TABLE tbl1_mm(key int, value string) CLUSTERED BY (key) SORTED BY 
> (key) INTO 2 BUCKETS;
> insert overwrite table tbl1_mm select * from src where key < 10;
> select key, value from tbl1_mm a where key < 6
> union all
> select key, value from tbl1_mm a where key < 6;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to