[
https://issues.apache.org/jira/browse/HIVE-21170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krzysztof Zarzycki updated HIVE-21170:
--------------------------------------
Environment:
Hive distribution: HDP 3.1.0.
LLAP execution engine in version:
{code:java}
$ beeline --version
Hive 3.1.0.3.1.0.0-78
Git
git://ctr-e138-1518143905142-586755-01-000015.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive
-r 56673b027117d8cb3400675b1680a4d992360808
Compiled by jenkins on Thu Dec 6 12:27:21 UTC 2018
>From source with checksum 97cc61f6acbe68b1fa988aa9f76b34cc{code}
was:
Hive distribution: HDP 3.1.0.
LLAP execution engine in version:
{code:java}
$ beeline --version
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive 3.1.0.3.1.0.0-78
Git
git://ctr-e138-1518143905142-586755-01-000015.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive
-r 56673b027117d8cb3400675b1680a4d992360808
Compiled by jenkins on Thu Dec 6 12:27:21 UTC 2018
>From source with checksum 97cc61f6acbe68b1fa988aa9f76b34cc{code}
> Wrong (no) results of cross-product query executed on LLAP
> -----------------------------------------------------------
>
> Key: HIVE-21170
> URL: https://issues.apache.org/jira/browse/HIVE-21170
> Project: Hive
> Issue Type: Bug
> Environment: Hive distribution: HDP 3.1.0.
> LLAP execution engine in version:
> {code:java}
> $ beeline --version
> Hive 3.1.0.3.1.0.0-78
> Git
> git://ctr-e138-1518143905142-586755-01-000015.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive
> -r 56673b027117d8cb3400675b1680a4d992360808
> Compiled by jenkins on Thu Dec 6 12:27:21 UTC 2018
> From source with checksum 97cc61f6acbe68b1fa988aa9f76b34cc{code}
>
> Reporter: Krzysztof Zarzycki
> Priority: Critical
>
> *In the environment:* \{code}
> $ beeline --version
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Hive 3.1.0.3.1.0.0-78
> Git
> git://ctr-e138-1518143905142-586755-01-000015.hwx.site/grid/0/jenkins/workspace/HDP-parallel-centos7/SOURCES/hive
> -r 56673b027117d8cb3400675b1680a4d992360808
> Compiled by jenkins on Thu Dec 6 12:27:21 UTC 2018
> From source with checksum 97cc61f6acbe68b1fa988aa9f76b34cc
> {code}
> *On LLAP execution engine*, the following query gives *wrong results*:
> {code}
> -- prepare test data
> set hive.query.results.cache.enabled=false;
> create table test1 (id int);
> insert into test1 values (1),(2),(3);
> -- query
> select * from test1 t1 cross join test1 t2;
> {code}
> *Query result:*
> {code}
> 0: jdbc:hive2://hostname:> select t1.* from test1 t1 cross join test1 t2;
> INFO : Compiling
> command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6):
> select t1.* from test1 t1 cross join test1 t2
> INFO : Warning: Shuffle Join MERGEJOIN[9][tables = [$hdt$_0, $hdt$_1]] in
> Stage 'Reducer 2' is a cross product
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:t1.id,
> type:int, comment:null)], properties:null)
> INFO : Completed compiling
> command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6);
> Time taken: 0.229 seconds
> INFO : Executing
> command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6):
> select t1.* from test1 t1 cross join test1 t2
> INFO : Query ID = hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6
> INFO : Total jobs = 1
> INFO : Launching Job 1 out of 1
> INFO : Starting task [Stage-1:MAPRED] in parallel
> ----------------------------------------------------------------------------------------------
> VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
> ----------------------------------------------------------------------------------------------
> Map 1 .......... llap SUCCEEDED 1 1 0 0 0 0
> Map 3 .......... llap SUCCEEDED 1 1 0 0 0 0
> Reducer 2 llap SUCCEEDED 0 0 0 0 0 0
> ----------------------------------------------------------------------------------------------
> VERTICES: 02/03 [==========================>>] 100% ELAPSED TIME: 1.92 s
> ----------------------------------------------------------------------------------------------
> INFO : Completed executing
> command(queryId=hive_20190125215942_7df8062c-8511-4915-a0d9-5e7ac84030f6);
> Time taken: 2.006 seconds
> INFO : OK
> +--------+
> | t1.id |
> +--------+
> +--------+
> No rows selected (2.284 seconds)
> {code}
> *Expected result:*
> {code}
> +--------+
> | t1.id |
> +--------+
> | 3 |
> | 3 |
> | 3 |
> | 2 |
> | 2 |
> | 2 |
> | 1 |
> | 1 |
> | 1 |
> +--------+
> 9 rows selected
> {code}
> *What worked as a workaround:*
> {\{set hive.tez.cartesian-product.enabled=false;}} (default true)
> Then query gave a correct result.
> *Difference in execution plans :*
> 1. With \{{set hive.tez.cartesian-product.enabled=true;}}:
> {code}
> +----------------------------------------------------+
> | Explain |
> +----------------------------------------------------+
> | Plan optimized by CBO. |
> | |
> | Vertex dependency in root stage |
> | Reducer 2 <- Map 1 (XPROD_EDGE), Map 3 (XPROD_EDGE) |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Stage-1 |
> | Reducer 2 llap |
> | File Output Operator [FS_8] |
> | Merge Join Operator [MERGEJOIN_9] (rows=9 width=8) |
> | Conds:(Inner),Output:["_col0","_col1"] |
> | <-Map 1 [XPROD_EDGE] vectorized, llap |
> | XPROD_EDGE [RS_11] |
> | Select Operator [SEL_10] (rows=3 width=4) |
> | Output:["_col0"] |
> | TableScan [TS_0] (rows=3 width=4) |
> | zarz@test1,t1, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] |
> | <-Map 3 [XPROD_EDGE] vectorized, llap |
> | XPROD_EDGE [RS_13] |
> | Select Operator [SEL_12] (rows=3 width=4) |
> | Output:["_col0"] |
> | TableScan [TS_2] (rows=3 width=4) |
> | zarz@test2,t2, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] |
> | |
> +----------------------------------------------------+
> {code}
> With \{{set hive.tez.cartesian-product.enabled=false;}}:
> {code}
> +----------------------------------------------------+
> | Explain |
> +----------------------------------------------------+
> | Plan optimized by CBO. |
> | |
> | Vertex dependency in root stage |
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 3 (CUSTOM_SIMPLE_EDGE) |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Stage-1 |
> | Reducer 2 llap |
> | File Output Operator [FS_8] |
> | Merge Join Operator [MERGEJOIN_9] (rows=9 width=8) |
> | Conds:(Inner),Output:["_col0","_col1"] |
> | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap |
> | PARTITION_ONLY_SHUFFLE [RS_11] |
> | Select Operator [SEL_10] (rows=3 width=4) |
> | Output:["_col0"] |
> | TableScan [TS_0] (rows=3 width=4) |
> | zarz@test1,t1, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] |
> | <-Map 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap |
> | PARTITION_ONLY_SHUFFLE [RS_13] |
> | Select Operator [SEL_12] (rows=3 width=4) |
> | Output:["_col0"] |
> | TableScan [TS_2] (rows=3 width=4) |
> | zarz@test2,t2, ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] |
> | |
> +----------------------------------------------------+
> {code}
> So I suspect some bug in XPROD_EDGE component executed on LLAP.
> *Note 1: Same query on Hive on Tez works as expected.*
> *Note 2: What I also tried but didn't help:*\{code}
> set hive.vectorized.execution.enabled=false; (default true)
> {code}
> That didn't change the result.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)