[
https://issues.apache.org/jira/browse/HIVE-13623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943474#comment-15943474
]
Oleksiy Sayankin commented on HIVE-13623:
-----------------------------------------
I have no issue with Hive-2.1.1 and Tez-0.8.4
{code}
with a as (
select 1 as c1
union all
select 2 as c1
union all
select 3 as c1
),
b as (
select 1 as c1
)
select *
from ( select a.c1 as ac1, b.c1 as bc1
from a left outer join b
on a.c1 = b.c1
) c
where c.bc1 is null;
{code}
Results is on TEZ
{code}
Status: Running (Executing on YARN cluster with App id
application_1490371429102_0009)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING
FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0
0 0
Map 3 .......... container SUCCEEDED 1 1 0 0
0 0
Map 4 .......... container SUCCEEDED 1 1 0 0
0 0
Map 5 .......... container SUCCEEDED 1 1 0 0
0 0
----------------------------------------------------------------------------------------------
VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 11.23 s
----------------------------------------------------------------------------------------------
OK
2017-03-27 18:28:33,715 INFO [c7d93c00-eb42-4731-bccb-a8fd56995bd9 main]
mapred.FileInputFormat: Total input paths to process : 2
3 NULL
2 NULL
Time taken: 18.484 seconds, Fetched: 2 row(s)
{code}
Query plan on Hive-2.1.1 on Tez-0.8.4
{code}
> explain with a as (
> select 1 as c1
> union all
> select 2 as c1
> union all
> select 3 as c1
> ),
> b as (
> select 1 as c1
> )
> select *
> from ( select a.c1 as ac1, b.c1 as bc1
> from a left outer join b
> on a.c1 = b.c1
> ) c
> where c.bc1 is null;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
DagId: mapr_20170327182950_39a484d6-3b9b-4021-a262-ba2d8722cf35:4
Edges:
Map 1 <- Map 5 (BROADCAST_EDGE), Union 2 (CONTAINS)
Map 3 <- Map 5 (BROADCAST_EDGE), Union 2 (CONTAINS)
Map 4 <- Map 5 (BROADCAST_EDGE), Union 2 (CONTAINS)
DagName:
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
expressions: 1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col0 (type: int)
1 1 (type: int)
outputColumnNames: _col0, _col1
input vertices:
1 Map 5
Statistics: Num rows: 3 Data size: 24 Basic stats:
COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Filter Operator
predicate: _col1 is null (type: boolean)
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: int), null (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Map 3
Map Operator Tree:
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
expressions: 2 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col0 (type: int)
1 1 (type: int)
outputColumnNames: _col0, _col1
input vertices:
1 Map 5
Statistics: Num rows: 3 Data size: 24 Basic stats:
COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Filter Operator
predicate: _col1 is null (type: boolean)
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: int), null (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Map 4
Map Operator Tree:
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
expressions: 3 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col0 (type: int)
1 1 (type: int)
outputColumnNames: _col0, _col1
input vertices:
1 Map 5
Statistics: Num rows: 3 Data size: 24 Basic stats:
COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Filter Operator
predicate: _col1 is null (type: boolean)
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: int), null (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Map 5
Map Operator Tree:
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
expressions: 1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
key expressions: 1 (type: int)
sort order: +
Map-reduce partition columns: 1 (type: int)
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: int)
Reduce Output Operator
key expressions: 1 (type: int)
sort order: +
Map-reduce partition columns: 1 (type: int)
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: int)
Reduce Output Operator
key expressions: 1 (type: int)
sort order: +
Map-reduce partition columns: 1 (type: int)
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: int)
Union 2
Vertex: Union 2
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 0.223 seconds, Fetched: 156 row(s)
{code}
> Hive on tez produce wrong results when withClause and (outer) joins
> -------------------------------------------------------------------
>
> Key: HIVE-13623
> URL: https://issues.apache.org/jira/browse/HIVE-13623
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.1.0, 2.0.0
> Reporter: JinsuKim
> Assignee: Sivashankar
> Priority: Trivial
>
> Hive on mr produce correct results when (outer) joins and withClause. But tez
> produce wrong results
> {code:sql|title=Case1}
> with a as (
> select 1 as c1
> union all
> select 2 as c1
> union all
> select 3 as c1
> ),
> b as (
> select 1 as c1
> )
> select *
> from ( select a.c1 as ac1, b.c1 as bc1
> from a left outer join b
> on a.c1 = b.c1
> ) c;
> where c.bc1 is null
> {code}
> {code:title=case1 result}
> mr :
> +--------+--------+--+
> | c.ac1 | c.bc1 |
> +--------+--------+--+
> | 2 | NULL |
> | 3 | NULL |
> +--------+--------+--+
> tez :
> +--------+--------+--+
> | c.ac1 | c.bc1 |
> +--------+--------+--+
> +--------+--------+--+
> {code}
> Case2 as similar to Case1 produces same results.
> {code:sql|title=Case2}
> with a as (
> select * from j1
> ),
> b as (
> select * from j2
> )
> select *
> from ( select a.c1 as ac1, b.c1 as bc1
> from a left outer join b
> on a.c1 = b.c1
> ) c
> where c.bc1 is null;
>
> drop table j1;
> create table j1 as select c1 from
> ( select 1 as c1 from default.dual
> union all
> select 2 as c1 from default.dual
> union all
> select 3 as c1 from default.dual
> ) t1;
> drop table j2;
> create table j2 as
> select 1 as c1 from default.dual;
> {code}
> {code:title=case2 result}
> mr :
> +--------+--------+--+
> | c.ac1 | c.bc1 |
> +--------+--------+--+
> | 2 | NULL |
> | 3 | NULL |
> +--------+--------+--+
> tez :
> +--------+--------+--+
> | c.ac1 | c.bc1 |
> +--------+--------+--+
> | 2 | NULL |
> | 3 | NULL |
> +--------+--------+--+
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)