-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23298/
-----------------------------------------------------------

Review request for hive.


Bugs: HIVE-7045
    https://issues.apache.org/jira/browse/HIVE-7045


Repository: hive-git


Description
-------

This happens whenever there are more than 1 reducers.

The scenario :

CREATE  TABLE t1 (a int, b int);
CREATE  TABLE t2 (cnt int) PARTITIONED BY (var_name string);

insert into table t1 select 1,1 from asd limit 1;
insert into table t1 select 2,2 from asd limit 1;

t1 contains :
1 1
2 2

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt 
insert overwrite table t2 partition(var_name='b') select count(b) cnt ;

select * from t2;
returns : 
2 a
2 b

as expected.

Setting the number of reducers higher than 1 :

set mapred.reduce.tasks=2;

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt
insert overwrite table t2 partition(var_name='b') select count(b) cnt;

select * from t2;
1       a
1       a
1       b
1       b

Wrong results.

This happens when ever t1 is big enough to automatically generate more than 1 
reducers and without specifying it directly.

adding "group by 1" in the end of each insert solves the problem :

from  t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1
insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 
1;

generates : 
2 a
2 b

This should work without the group by...
The number of rows for each partition will be the amount of reducers.
Each reducer calculated a sub total of the count.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 399f92a 

Diff: https://reviews.apache.org/r/23298/diff/


Testing
-------


Thanks,

Navis Ryu

Reply via email to