[
https://issues.apache.org/jira/browse/HIVE-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eugene Koifman updated HIVE-15146:
----------------------------------
Description:
Consider:
{noformat}
create table if not exists srcpart (a int, b int, c int)
partitioned by (z int)
clustered by (a) into 2 buckets
stored as orc
tblproperties("transactional"="true");
create temporary table if not exists data1 (x int);
insert into data1 values (1),(2),(3);
explain from data1
insert into srcpart partition(z) select 0,0,1,x
insert into srcpart partition(z=1) select 0,0,1;
{noformat}
Then the plan looks like:
{noformat}
2016-11-07T16:56:19,045 INFO [main] ql.TestTxnCommands2: STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-0 depends on stages: Stage-2
Stage-3 depends on stages: Stage-0
Stage-4 depends on stages: Stage-2
Stage-1 depends on stages: Stage-4
Stage-5 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
alias: data1
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
Select Operator
expressions: x (type: int)
outputColumnNames: _col3
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
Reduce Output Operator
sort order:
Map-reduce partition columns: 0 (type: int)
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col3 (type: int)
Select Operator
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
File Output Operator
compressed: false
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Reduce Operator Tree:
Select Operator
expressions: 0 (type: int), 0 (type: int), 1 (type: int), VALUE._col2
(type: int)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.srcpart
Stage: Stage-0
Move Operator
tables:
partition:
z
replace: false
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.srcpart
Stage: Stage-3
Stats-Aggr Operator
Stage: Stage-4
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
sort order:
Map-reduce partition columns: 0 (type: int)
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
Reduce Operator Tree:
Select Operator
expressions: 0 (type: int), 0 (type: int), 1 (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.srcpart
Stage: Stage-1
Move Operator
tables:
partition:
z 1
replace: false
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.srcpart
Stage: Stage-5
Stats-Aggr Operator
{noformat}
Note that there are 2 stats aggregation tasks but both branches of the
multi-insert update the same partition
Once HIVE-14943 is in, there will be other ways to generate the same situation.
In particular it will be possible to have 2 or 3 branches of the multi-insert
any or all of which are using dynamic partition insert which means the set of
partitions actually updated is not known until run-time.
If at all possible, the solution should address this.
was:
Consider:
{noformat}
create table if not exists srcpart (a int, b int, c int)
partitioned by (z int)
clustered by (a) into 2 buckets
stored as orc
tblproperties("transactional"="true");
create temporary table if not exists data1 (x int);
insert into data1 values (1),(2),(3);
explain from data1
insert into srcpart partition(z) select 0,0,1,x
insert into srcpart partition(z=1) select 0,0,1;
{noformat}
Then the plan looks like:
{noformat}
2016-11-07T16:56:19,045 INFO [main] ql.TestTxnCommands2: STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-0 depends on stages: Stage-2
Stage-3 depends on stages: Stage-0
Stage-4 depends on stages: Stage-2
Stage-1 depends on stages: Stage-4
Stage-5 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-2
Map Reduce
Map Operator Tree:
TableScan
alias: data1
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
Select Operator
expressions: x (type: int)
outputColumnNames: _col3
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
Reduce Output Operator
sort order:
Map-reduce partition columns: 0 (type: int)
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col3 (type: int)
Select Operator
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
File Output Operator
compressed: false
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
Reduce Operator Tree:
Select Operator
expressions: 0 (type: int), 0 (type: int), 1 (type: int), VALUE._col2
(type: int)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.srcpart
Stage: Stage-0
Move Operator
tables:
partition:
z
replace: false
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.srcpart
Stage: Stage-3
Stats-Aggr Operator
Stage: Stage-4
Map Reduce
Map Operator Tree:
TableScan
Reduce Output Operator
sort order:
Map-reduce partition columns: 0 (type: int)
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
Reduce Operator Tree:
Select Operator
expressions: 0 (type: int), 0 (type: int), 1 (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
stats: NONE
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.srcpart
Stage: Stage-1
Move Operator
tables:
partition:
z 1
replace: false
table:
input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
name: default.srcpart
Stage: Stage-5
Stats-Aggr Operator
{noformat}
Note that there are 2 stats aggregation tasks but both branches of the
multi-insert update the same partition
Once HIVE-14943 is in, there will be other ways to generate the same sitation
> Too many Stats-Aggr Operator in multi-insert
> --------------------------------------------
>
> Key: HIVE-15146
> URL: https://issues.apache.org/jira/browse/HIVE-15146
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Reporter: Eugene Koifman
> Assignee: Pengcheng Xiong
>
> Consider:
> {noformat}
> create table if not exists srcpart (a int, b int, c int)
> partitioned by (z int)
> clustered by (a) into 2 buckets
> stored as orc
> tblproperties("transactional"="true");
> create temporary table if not exists data1 (x int);
> insert into data1 values (1),(2),(3);
> explain from data1
> insert into srcpart partition(z) select 0,0,1,x
> insert into srcpart partition(z=1) select 0,0,1;
> {noformat}
> Then the plan looks like:
> {noformat}
> 2016-11-07T16:56:19,045 INFO [main] ql.TestTxnCommands2: STAGE DEPENDENCIES:
> Stage-2 is a root stage
> Stage-0 depends on stages: Stage-2
> Stage-3 depends on stages: Stage-0
> Stage-4 depends on stages: Stage-2
> Stage-1 depends on stages: Stage-4
> Stage-5 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-2
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: data1
> Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
> stats: NONE
> Select Operator
> expressions: x (type: int)
> outputColumnNames: _col3
> Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE
> Column stats: NONE
> Reduce Output Operator
> sort order:
> Map-reduce partition columns: 0 (type: int)
> Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE
> Column stats: NONE
> value expressions: _col3 (type: int)
> Select Operator
> Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE
> Column stats: NONE
> File Output Operator
> compressed: false
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
> Reduce Operator Tree:
> Select Operator
> expressions: 0 (type: int), 0 (type: int), 1 (type: int),
> VALUE._col2 (type: int)
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
> stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
> stats: NONE
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: default.srcpart
> Stage: Stage-0
> Move Operator
> tables:
> partition:
> z
> replace: false
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: default.srcpart
> Stage: Stage-3
> Stats-Aggr Operator
> Stage: Stage-4
> Map Reduce
> Map Operator Tree:
> TableScan
> Reduce Output Operator
> sort order:
> Map-reduce partition columns: 0 (type: int)
> Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE
> Column stats: NONE
> Reduce Operator Tree:
> Select Operator
> expressions: 0 (type: int), 0 (type: int), 1 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
> stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 6 Basic stats: COMPLETE Column
> stats: NONE
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: default.srcpart
> Stage: Stage-1
> Move Operator
> tables:
> partition:
> z 1
> replace: false
> table:
> input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> name: default.srcpart
> Stage: Stage-5
> Stats-Aggr Operator
> {noformat}
> Note that there are 2 stats aggregation tasks but both branches of the
> multi-insert update the same partition
> Once HIVE-14943 is in, there will be other ways to generate the same
> situation.
> In particular it will be possible to have 2 or 3 branches of the multi-insert
> any or all of which are using dynamic partition insert which means the set of
> partitions actually updated is not known until run-time.
> If at all possible, the solution should address this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)