[
https://issues.apache.org/jira/browse/HIVE-23667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
gaozhan ding updated HIVE-23667:
--------------------------------
Description:
We use hive with version 3.1.0 with tez engine 0.9.1.3
I encountered an error when executing a hive SQL. This SQL is as follows
{code:java}
set mapreduce.job.queuename=root.xxx;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set hive.fileformat.check=false;
set mapred.reduce.tasks=50;
set hive.auto.convert.join=true;
use xxx;
select count(*) from A join B on B.b=A.a;{code}
with the output.
{code:java}
+----------+ | _c0 | +----------+ | 4954736 | +----------+
{code}
But when the hive.auto.convert.join option is set to false,the utput is not as
expected。
The SQL is as follows
{code:java}
set mapreduce.job.queuename=root.xxx;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set hive.fileformat.check=false;
set mapred.reduce.tasks=50;
set hive.auto.convert.join=false; //changed
use xxx;
select count(*) from A join B on B.b=A.a;{code}
with output:
{code:java}
+------+ | _c0 | +------+ | 0 | +------+
{code}
Beside,both tables participating in the join are partition tables.
Especially,if the option mapred.reduce.tasks=50 was not set,all above the sql
output expected results.
We just upgraded hive from 1.2 to 3.1.0, and we found that these problems only
occurred in the old hive table.
was:
We use hive with version 3.1.0 with tez engine 0.9.1.3
I encountered an error when executing a hive SQL. This SQL is as follows
{code:java}
set mapreduce.job.queuename=root.xxx;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set hive.fileformat.check=false;
set mapred.reduce.tasks=50;
set hive.auto.convert.join=true;
use xxx;
select count(*) from 230_dim_site join dw_fact_inverter_detail on
dw_fact_inverter_detail.site=230_dim_site.id;{code}
with the output.
{code:java}
+----------+ | _c0 | +----------+ | 4954736 | +----------+
{code}
But when the hive.auto.convert.join option is set to false,the utput is not as
expected。
The SQL is as follows
{code:java}
set mapreduce.job.queuename=root.xxx;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=10000;
set hive.exec.max.dynamic.partitions=10000;
set hive.fileformat.check=false;
set mapred.reduce.tasks=50;
set hive.auto.convert.join=false; //changed
use xxx;
select count(*) from 230_dim_site join dw_fact_inverter_detail on
dw_fact_inverter_detail.site=230_dim_site.id;{code}
with output:
{code:java}
+------+ | _c0 | +------+ | 0 | +------+
{code}
Beside,both tables participating in the join are partition tables.
Especially,if the option mapred.reduce.tasks=50 was not set,all above the sql
output expected results.
We just upgraded hive from 1.2 to 3.1.0, and we found that these problems only
occurred in the old hive table.
> Incorrect output with option hive.auto.convert.join=fasle
> ---------------------------------------------------------
>
> Key: HIVE-23667
> URL: https://issues.apache.org/jira/browse/HIVE-23667
> Project: Hive
> Issue Type: Bug
> Affects Versions: 3.1.0
> Reporter: gaozhan ding
> Priority: Critical
>
> We use hive with version 3.1.0 with tez engine 0.9.1.3
> I encountered an error when executing a hive SQL. This SQL is as follows
> {code:java}
> set mapreduce.job.queuename=root.xxx;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> set hive.exec.max.dynamic.partitions.pernode=10000;
> set hive.exec.max.dynamic.partitions=10000;
> set hive.fileformat.check=false;
> set mapred.reduce.tasks=50;
> set hive.auto.convert.join=true;
> use xxx;
> select count(*) from A join B on B.b=A.a;{code}
> with the output.
> {code:java}
> +----------+ | _c0 | +----------+ | 4954736 | +----------+
> {code}
> But when the hive.auto.convert.join option is set to false,the utput is not
> as expected。
> The SQL is as follows
> {code:java}
> set mapreduce.job.queuename=root.xxx;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> set hive.exec.max.dynamic.partitions.pernode=10000;
> set hive.exec.max.dynamic.partitions=10000;
> set hive.fileformat.check=false;
> set mapred.reduce.tasks=50;
> set hive.auto.convert.join=false; //changed
> use xxx;
> select count(*) from A join B on B.b=A.a;{code}
> with output:
> {code:java}
> +------+ | _c0 | +------+ | 0 | +------+
> {code}
> Beside,both tables participating in the join are partition tables.
> Especially,if the option mapred.reduce.tasks=50 was not set,all above the sql
> output expected results.
> We just upgraded hive from 1.2 to 3.1.0, and we found that these problems
> only occurred in the old hive table.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)