[
https://issues.apache.org/jira/browse/HIVE-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013144#comment-16013144
]
Hive QA commented on HIVE-16668:
--------------------------------
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12868374/HIVE-16668.1.patch
{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.
{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10720 tests
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2]
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lvj_ptf] (batchId=15)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[table_nonprintable]
(batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
(batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_join30]
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3]
(batchId=97)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
(batchId=97)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union17]
(batchId=130)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union31]
(batchId=100)
{noformat}
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5288/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5288/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5288/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12868374 - PreCommit-HIVE-Build
> Hive on Spark generates incorrect plan and result with window function and
> lateral view
> ---------------------------------------------------------------------------------------
>
> Key: HIVE-16668
> URL: https://issues.apache.org/jira/browse/HIVE-16668
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Chao Sun
> Assignee: Chao Sun
> Attachments: HIVE-16668.1.patch
>
>
> To reproduce:
> {code}
> create table t1 (a string);
> create table t2 (a array<string>);
> create table dummy (a string);
> insert into table dummy values ("a");
> insert into t1 values ("1"), ("2");
> insert into t2 select array("1", "2", "3", "4") from dummy;
> set hive.auto.convert.join.noconditionaltask.size=3;
> explain
> with tt1 as (
> select a as id, count(*) over () as count
> from t1
> ),
> tt2 as (
> select id
> from t2
> lateral view outer explode(a) a_tbl as id
> )
> select tt1.count
> from tt1 join tt2 on tt1.id = tt2.id;
> {code}
> For Hive on Spark, the plan is:
> {code}
> STAGE DEPENDENCIES:
> Stage-2 is a root stage
> Stage-1 depends on stages: Stage-2
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-2
> Spark
> Edges:
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 3), Map 1 (PARTITION-LEVEL
> SORT, 3)
> DagName: chao_20170515133259_de9e0583-da24-4399-afc8-b881dfef0469:9
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: t1
> Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE
> Column stats: NONE
> Reduce Output Operator
> key expressions: 0 (type: int)
> sort order: +
> Map-reduce partition columns: 0 (type: int)
> Statistics: Num rows: 2 Data size: 2 Basic stats:
> COMPLETE Column stats: NONE
> value expressions: a (type: string)
> Reducer 2
> Local Work:
> Map Reduce Local Work
> Reduce Operator Tree:
> Select Operator
> expressions: VALUE._col0 (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE
> Column stats: NONE
> PTF Operator
> Function definitions:
> Input definition
> input alias: ptf_0
> output shape: _col0: string
> type: WINDOWING
> Windowing table definition
> input alias: ptf_1
> name: windowingtablefunction
> order by: 0 ASC NULLS FIRST
> partition by: 0
> raw input shape:
> window functions:
> window function definition
> alias: count_window_0
> name: count
> window function: GenericUDAFCountEvaluator
> window frame: PRECEDING(MAX)~FOLLOWING(MAX)
> isStar: true
> Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE
> Column stats: NONE
> Filter Operator
> predicate: _col0 is not null (type: boolean)
> Statistics: Num rows: 2 Data size: 2 Basic stats:
> COMPLETE Column stats: NONE
> Select Operator
> expressions: _col0 (type: string), count_window_0
> (type: bigint)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 2 Data size: 2 Basic stats:
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
> keys:
> 0 _col0 (type: string)
> 1 _col0 (type: string)
> Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 2 Data size: 2 Basic stats:
> COMPLETE Column stats: NONE
> value expressions: _col1 (type: bigint)
> Stage: Stage-1
> Spark
> DagName: chao_20170515133259_de9e0583-da24-4399-afc8-b881dfef0469:8
> Vertices:
> Map 3
> Map Operator Tree:
> TableScan
> alias: t2
> Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE
> Column stats: NONE
> Lateral View Forward
> Statistics: Num rows: 1 Data size: 20 Basic stats:
> COMPLETE Column stats: NONE
> Select Operator
> Statistics: Num rows: 1 Data size: 20 Basic stats:
> COMPLETE Column stats: NONE
> Lateral View Join Operator
> outputColumnNames: _col4
> Statistics: Num rows: 2 Data size: 40 Basic stats:
> COMPLETE Column stats: NONE
> Select Operator
> expressions: _col4 (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 40 Basic stats:
> COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: string)
> 1 _col0 (type: string)
> outputColumnNames: _col1
> input vertices:
> 0 Reducer 2
> Statistics: Num rows: 2 Data size: 2 Basic stats:
> COMPLETE Column stats: NONE
> Select Operator
> expressions: _col1 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 2 Basic
> stats: COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 2 Data size: 2 Basic
> stats: COMPLETE Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Select Operator
> expressions: a (type: array<string>)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 20 Basic stats:
> COMPLETE Column stats: NONE
> UDTF Operator
> Statistics: Num rows: 1 Data size: 20 Basic stats:
> COMPLETE Column stats: NONE
> function name: explode
> outer lateral view: true
> Filter Operator
> predicate: col is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 20 Basic stats:
> COMPLETE Column stats: NONE
> Lateral View Join Operator
> outputColumnNames: _col4
> Statistics: Num rows: 2 Data size: 40 Basic
> stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col4 (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 40 Basic
> stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: string)
> 1 _col0 (type: string)
> outputColumnNames: _col1
> input vertices:
> 0 Reducer 2
> Statistics: Num rows: 2 Data size: 2 Basic
> stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col1 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 2 Data size: 2 Basic
> stats: COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 2 Data size: 2
> Basic stats: COMPLETE Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Local Work:
> Map Reduce Local Work
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
> Note that there're two {{Map 1}} s as inputs for {{Reduce 2}}.
> The result for this query is:
> {code}
> 4
> 4
> 4
> 4
> {code}
> for Hive on Spark, which is not correct.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)