Gabi Kazav created HIVE-4730:
--------------------------------
Summary: Join on more than 2^31 records on single reducer failed
(wrong results)
Key: HIVE-4730
URL: https://issues.apache.org/jira/browse/HIVE-4730
Project: Hive
Issue Type: Bug
Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0, 0.7.1
Reporter: Gabi Kazav
Priority: Critical
join on more than 2^31 rows leads to wrong results. for example:
Create table small_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED
BY '\n';
Create table big_table (p1 string) ROW FORMAT DELIMITED LINES TERMINATED BY
'\n';
Loading 1 row to small_table (the value 1).
Loading 2149580800 rows to big_table with the same value (1 on this case).
create table output as select a.p1 from big_table a join small_table b on
(a.p1=b.p1);
select count(*) from output ; will return only 1 row...
the reducer syslog:
...
2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 2147000000
rows: used memory = 32925960
2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 2148000000
rows: used memory = 12815184
2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 2149000000
rows: used memory = 26684552 <-- looks like wrong value..
...
2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801
rows: used memory = 17715896
2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4
finished. closing...
2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4
forwarded 1 rows
2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator:
SKEWJOINFOLLOWUPJOBS:0
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5
finished. closing...
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5
forwarded 1 rows
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6
finished. closing...
2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6
forwarded 0 rows
2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator:
TABLE_ID_1_ROWCOUNT:1
2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5
Close done
2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4
Close done
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira