[
https://issues.apache.org/jira/browse/HIVE-14285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360203#comment-16360203
]
liyunzhang commented on HIVE-14285:
-----------------------------------
[~kgyrtkirk]: I want to ask a question that the works in 1 Stage(like Map1,
Map4 in Stage-1) are executed in parallel although Map4 is after Map1 in
explain.
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: srcpart
filterExpr: ds is not null (type: boolean)
Statistics: Num rows: 2000 Data size: 389248 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: ds (type: string)
outputColumnNames: _col0
Statistics: Num rows: 2000 Data size: 368000 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 2000 Data size: 368000 Basic stats:
COMPLETE Column stats: COMPLETE
Execution mode: llap
LLAP IO: no inputs
Map 4
Map Operator Tree:
TableScan
alias: srcpart_date
filterExpr: ((date = '2008-04-08') and ds is not null) (type:
boolean)
Statistics: Num rows: 2 Data size: 736 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: ((date = '2008-04-08') and ds is not null)
(type: boolean)
Statistics: Num rows: 2 Data size: 736 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: ds (type: string)
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 736 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 2 Data size: 736 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 736 Basic stats:
COMPLETE Column stats: NONE
Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 2 Data size: 736 Basic stats:
COMPLETE Column stats: NONE
Dynamic Partitioning Event Operator
Target column: ds (string)
Target Input: srcpart
Partition key expr: ds
Statistics: Num rows: 2 Data size: 736 Basic stats:
COMPLETE Column stats: NONE
Target Vertex: Map 1
Execution mode: llap
LLAP IO: no inputs
Reducer 2
Execution mode: llap
Reduce Operator Tree:
Merge Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: string)
1 _col0 (type: string)
Statistics: Num rows: 2200 Data size: 404800 Basic stats:
COMPLETE Column stats: NONE
Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col0 (type: bigint)
Reducer 3
Execution mode: llap
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
{code}
If works in the Stage are executed in parallel, I guess there is no problem for
{{ExplainTask#getBasictypeKeyedMap}} which sort works by the work#name in my
previous question. Although it causes Map10 is in front of Map6 in above
example in explain, Map10 and Map6 are executed parallel in the runtime.
> Explain outputs: map-entry ordering of non-primitive objects.
> ---------------------------------------------------------------
>
> Key: HIVE-14285
> URL: https://issues.apache.org/jira/browse/HIVE-14285
> Project: Hive
> Issue Type: Improvement
> Reporter: Zoltan Haindrich
> Assignee: Zoltan Haindrich
> Priority: Minor
> Fix For: 2.3.0
>
> Attachments: HIVE-14285.1.patch
>
>
> In HIVE-12244 I've left behind some ugly backward compatible getters with
> {{@Explain}} decorations to keep the qtests from breaking.
> There were heavy explain plan changes when I used {{Path}} objects as keys in
> {{@Explain}} marked methods.
> I've looked into the causes of this:
> * there is a {{TreeSet}} in there to keep all the keys in order.
> * but: {{org.apache.hadoop.fs.Path}} uses a different sort order (inherited
> from {{java.net.URI}} )...it sorts the paths using
> priorities:[schema,schemeSpecificPart,host,path,query,fragment]
> considering that the output is an explain result(possibly read by humans):
> i don't think this sophisticated sort order can be useful.
> {{ExplainTask#outputMap}} always calls toString() on the keys before using
> them...so the most painless solution would be to change all the keys inside
> the treeset to simple strings (in case it's not a primitive already); this
> would restore the original behaviour for me.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)