[
https://issues.apache.org/jira/browse/HIVE-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xuefu Zhang updated HIVE-12736:
-------------------------------
Description:
{code}
select * from staff;
1 jone 22 1
2 lucy 21 1
3 hmm 22 2
4 james 24 3
5 xiaoliu 23 3
select id,date_ from trade union all select id,"test" from trade ;
1 201510210908
2 201509080234
2 201509080235
1 test
2 test
2 test
set hive.execution.engine=spark;
set spark.master=local;
select /*+mapjoin(t)*/ * from staff s join
(select id,date_ from trade union all select id,"test" from trade ) t on
s.id=t.id;
1 jone 22 1 1 201510210908
2 lucy 21 1 2 201509080234
2 lucy 21 1 2 201509080235
set hive.execution.engine=mr;
select /*+mapjoin(t)*/ * from staff s join
(select id,date_ from trade union all select id,"test" from trade ) t on
s.id=t.id;
FAILED: SemanticException [Error 10227]: Not all clauses are supported with
mapjoin hint. Please remove mapjoin hint.
{code}
I have two questions
1.Why result of hive on spark not include the following record?
{code}
1 jone 22 1 1 test
2 lucy 21 1 2 test
2 lucy 21 1 2 test
{code}
2.Why there are two different ways of dealing same query?
explain 1:
{code}
set hive.execution.engine=spark;
set spark.master=local;
explain
select id,date_ from trade union all select id,"test" from trade;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Spark
DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: trade
Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: id (type: int), date_ (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 12 Data size: 96 Basic stats:
COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Map 2
Map Operator Tree:
TableScan
alias: trade
Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: id (type: int), 'test' (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 12 Data size: 96 Basic stats:
COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
explain 2:
{code}
set hive.execution.engine=spark;
set spark.master=local;
explain
select /*+mapjoin(t)*/ * from staff s join
(select id,date_ from trade union all select id,"test" from trade ) t on
s.id=t.id;
OK
STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-1 depends on stages: Stage-2
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-2
Spark
DagName: jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:3
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: trade
Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: id is not null (type: boolean)
Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: id (type: int), date_ (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 3 Data size: 24 Basic stats:
COMPLETE Column stats: NONE
Spark HashTable Sink Operator
keys:
0 id (type: int)
1 _col0 (type: int)
Local Work:
Map Reduce Local Work
Stage: Stage-1
Spark
DagName: jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:2
Vertices:
Map 2
Map Operator Tree:
TableScan
alias: s
Statistics: Num rows: 1 Data size: 66 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: id is not null (type: boolean)
Statistics: Num rows: 1 Data size: 66 Basic stats: COMPLETE
Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 id (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col7,
_col8
input vertices:
1 Map 1
Statistics: Num rows: 6 Data size: 52 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: string),
_col2 (type: int), _col3 (type: int), _col7 (type: int), _col8 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4,
_col5
Statistics: Num rows: 6 Data size: 52 Basic stats:
COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 6 Data size: 52 Basic stats:
COMPLETE Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
I can't find any information about union "test" in explain 2.
Some properties on hive-site.xml is
{code}
<property>
<name>hive.ignore.mapjoin.hint</name>
<value>false</value>
</property>
<property>
<name>hive.auto.convert.join</name>
<value>true</value>
</property>
<property>
<name>hive.auto.convert.join.noconditionaltask</name>
<value>true</value>
{code}
was:
select * from staff;
1 jone 22 1
2 lucy 21 1
3 hmm 22 2
4 james 24 3
5 xiaoliu 23 3
select id,date_ from trade union all select id,"test" from trade ;
1 201510210908
2 201509080234
2 201509080235
1 test
2 test
2 test
set hive.execution.engine=spark;
set spark.master=local;
select /*+mapjoin(t)*/ * from staff s join
(select id,date_ from trade union all select id,"test" from trade ) t on
s.id=t.id;
1 jone 22 1 1 201510210908
2 lucy 21 1 2 201509080234
2 lucy 21 1 2 201509080235
set hive.execution.engine=mr;
select /*+mapjoin(t)*/ * from staff s join
(select id,date_ from trade union all select id,"test" from trade ) t on
s.id=t.id;
FAILED: SemanticException [Error 10227]: Not all clauses are supported with
mapjoin hint. Please remove mapjoin hint.
I have two questions
1.Why result of hive on spark not include the following record?
1 jone 22 1 1 test
2 lucy 21 1 2 test
2 lucy 21 1 2 test
2.Why there are two different ways of dealing same query?
explain 1:
set hive.execution.engine=spark;
set spark.master=local;
explain
select id,date_ from trade union all select id,"test" from trade;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Spark
DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: trade
Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: id (type: int), date_ (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 12 Data size: 96 Basic stats:
COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Map 2
Map Operator Tree:
TableScan
alias: trade
Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: id (type: int), 'test' (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 12 Data size: 96 Basic stats:
COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
explain 2:
set hive.execution.engine=spark;
set spark.master=local;
explain
select /*+mapjoin(t)*/ * from staff s join
(select id,date_ from trade union all select id,"test" from trade ) t on
s.id=t.id;
OK
STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-1 depends on stages: Stage-2
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-2
Spark
DagName: jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:3
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: trade
Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: id is not null (type: boolean)
Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: id (type: int), date_ (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 3 Data size: 24 Basic stats:
COMPLETE Column stats: NONE
Spark HashTable Sink Operator
keys:
0 id (type: int)
1 _col0 (type: int)
Local Work:
Map Reduce Local Work
Stage: Stage-1
Spark
DagName: jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:2
Vertices:
Map 2
Map Operator Tree:
TableScan
alias: s
Statistics: Num rows: 1 Data size: 66 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: id is not null (type: boolean)
Statistics: Num rows: 1 Data size: 66 Basic stats: COMPLETE
Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 id (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col7,
_col8
input vertices:
1 Map 1
Statistics: Num rows: 6 Data size: 52 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: string),
_col2 (type: int), _col3 (type: int), _col7 (type: int), _col8 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4,
_col5
Statistics: Num rows: 6 Data size: 52 Basic stats:
COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 6 Data size: 52 Basic stats:
COMPLETE Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
I can't find any information about union "test" in explain 2.
Some properties on hive-site.xml is
<property>
<name>hive.ignore.mapjoin.hint</name>
<value>false</value>
</property>
<property>
<name>hive.auto.convert.join</name>
<value>true</value>
</property>
<property>
<name>hive.auto.convert.join.noconditionaltask</name>
<value>true</value>
> It seems that result of Hive on Spark be mistaken and result of Hive and Hive
> on Spark are not the same
> -------------------------------------------------------------------------------------------------------
>
> Key: HIVE-12736
> URL: https://issues.apache.org/jira/browse/HIVE-12736
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.1.1, 1.2.1
> Reporter: JoneZhang
> Assignee: Chengxiang Li
> Attachments: HIVE-12736.1-spark.patch, HIVE-12736.2-spark.patch
>
>
> {code}
> select * from staff;
> 1 jone 22 1
> 2 lucy 21 1
> 3 hmm 22 2
> 4 james 24 3
> 5 xiaoliu 23 3
> select id,date_ from trade union all select id,"test" from trade ;
> 1 201510210908
> 2 201509080234
> 2 201509080235
> 1 test
> 2 test
> 2 test
> set hive.execution.engine=spark;
> set spark.master=local;
> select /*+mapjoin(t)*/ * from staff s join
> (select id,date_ from trade union all select id,"test" from trade ) t on
> s.id=t.id;
> 1 jone 22 1 1 201510210908
> 2 lucy 21 1 2 201509080234
> 2 lucy 21 1 2 201509080235
> set hive.execution.engine=mr;
> select /*+mapjoin(t)*/ * from staff s join
> (select id,date_ from trade union all select id,"test" from trade ) t on
> s.id=t.id;
> FAILED: SemanticException [Error 10227]: Not all clauses are supported with
> mapjoin hint. Please remove mapjoin hint.
> {code}
> I have two questions
> 1.Why result of hive on spark not include the following record?
> {code}
> 1 jone 22 1 1 test
> 2 lucy 21 1 2 test
> 2 lucy 21 1 2 test
> {code}
> 2.Why there are two different ways of dealing same query?
> explain 1:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain
> select id,date_ from trade union all select id,"test" from trade;
> OK
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Spark
> DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: trade
> Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE
> Column stats: NONE
> Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats:
> COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 12 Data size: 96 Basic stats:
> COMPLETE Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Map 2
> Map Operator Tree:
> TableScan
> alias: trade
> Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE
> Column stats: NONE
> Select Operator
> expressions: id (type: int), 'test' (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6 Data size: 48 Basic stats:
> COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 12 Data size: 96 Basic stats:
> COMPLETE Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
> explain 2:
> {code}
> set hive.execution.engine=spark;
> set spark.master=local;
> explain
> select /*+mapjoin(t)*/ * from staff s join
> (select id,date_ from trade union all select id,"test" from trade ) t on
> s.id=t.id;
> OK
> STAGE DEPENDENCIES:
> Stage-2 is a root stage
> Stage-1 depends on stages: Stage-2
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-2
> Spark
> DagName: jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:3
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: trade
> Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE
> Column stats: NONE
> Filter Operator
> predicate: id is not null (type: boolean)
> Statistics: Num rows: 3 Data size: 24 Basic stats:
> COMPLETE Column stats: NONE
> Select Operator
> expressions: id (type: int), date_ (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 3 Data size: 24 Basic stats:
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
> keys:
> 0 id (type: int)
> 1 _col0 (type: int)
> Local Work:
> Map Reduce Local Work
> Stage: Stage-1
> Spark
> DagName: jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:2
> Vertices:
> Map 2
> Map Operator Tree:
> TableScan
> alias: s
> Statistics: Num rows: 1 Data size: 66 Basic stats: COMPLETE
> Column stats: NONE
> Filter Operator
> predicate: id is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 66 Basic stats:
> COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 id (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col0, _col1, _col2, _col3, _col7,
> _col8
> input vertices:
> 1 Map 1
> Statistics: Num rows: 6 Data size: 52 Basic stats:
> COMPLETE Column stats: NONE
> Select Operator
> expressions: _col0 (type: int), _col1 (type: string),
> _col2 (type: int), _col3 (type: int), _col7 (type: int), _col8 (type: string)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5
> Statistics: Num rows: 6 Data size: 52 Basic stats:
> COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 6 Data size: 52 Basic stats:
> COMPLETE Column stats: NONE
> table:
> input format:
> org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Local Work:
> Map Reduce Local Work
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
> I can't find any information about union "test" in explain 2.
> Some properties on hive-site.xml is
> {code}
> <property>
> <name>hive.ignore.mapjoin.hint</name>
> <value>false</value>
> </property>
> <property>
> <name>hive.auto.convert.join</name>
> <value>true</value>
> </property>
> <property>
> <name>hive.auto.convert.join.noconditionaltask</name>
> <value>true</value>
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)