[jira] [Commented] (HIVE-14285) Explain outputs: map-entry ordering of non-primitive objects.

liyunzhang (JIRA) Sun, 11 Feb 2018 17:15:21 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-14285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360203#comment-16360203
 ]


liyunzhang commented on HIVE-14285:
-----------------------------------

[~kgyrtkirk]:  I want to ask a question that the works in 1 Stage(like  Map1, 
Map4 in Stage-1) are executed in parallel although Map4 is after Map1 in 
explain.
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Tez
#### A masked pattern was here ####
      Edges:
        Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
        Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE)
#### A masked pattern was here ####
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: srcpart
                  filterExpr: ds is not null (type: boolean)
                  Statistics: Num rows: 2000 Data size: 389248 Basic stats: 
COMPLETE Column stats: COMPLETE
                  Select Operator
                    expressions: ds (type: string)
                    outputColumnNames: _col0
                    Statistics: Num rows: 2000 Data size: 368000 Basic stats: 
COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: string)
                      sort order: +
                      Map-reduce partition columns: _col0 (type: string)
                      Statistics: Num rows: 2000 Data size: 368000 Basic stats: 
COMPLETE Column stats: COMPLETE
            Execution mode: llap
            LLAP IO: no inputs
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: srcpart_date
                  filterExpr: ((date = '2008-04-08') and ds is not null) (type: 
boolean)
                  Statistics: Num rows: 2 Data size: 736 Basic stats: COMPLETE 
Column stats: NONE
                  Filter Operator
                    predicate: ((date = '2008-04-08') and ds is not null) 
(type: boolean)
                    Statistics: Num rows: 2 Data size: 736 Basic stats: 
COMPLETE Column stats: NONE
                    Select Operator
                      expressions: ds (type: string)
                      outputColumnNames: _col0
                      Statistics: Num rows: 2 Data size: 736 Basic stats: 
COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: _col0 (type: string)
                        sort order: +
                        Map-reduce partition columns: _col0 (type: string)
                        Statistics: Num rows: 2 Data size: 736 Basic stats: 
COMPLETE Column stats: NONE
                      Select Operator
                        expressions: _col0 (type: string)
                        outputColumnNames: _col0
                        Statistics: Num rows: 2 Data size: 736 Basic stats: 
COMPLETE Column stats: NONE
                        Group By Operator
                          keys: _col0 (type: string)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 2 Data size: 736 Basic stats: 
COMPLETE Column stats: NONE
                          Dynamic Partitioning Event Operator
                            Target column: ds (string)
                            Target Input: srcpart
                            Partition key expr: ds
                            Statistics: Num rows: 2 Data size: 736 Basic stats: 
COMPLETE Column stats: NONE
                            Target Vertex: Map 1
            Execution mode: llap
            LLAP IO: no inputs
        Reducer 2 
            Execution mode: llap
            Reduce Operator Tree:
              Merge Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 _col0 (type: string)
                  1 _col0 (type: string)
                Statistics: Num rows: 2200 Data size: 404800 Basic stats: 
COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: count()
                  mode: hash
                  outputColumnNames: _col0
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
                  Reduce Output Operator
                    sort order: 
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
                    value expressions: _col0 (type: bigint)
        Reducer 3 
            Execution mode: llap
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
                  table:
                      input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

{code}

If works in the Stage are executed in parallel, I guess there is no problem for 
{{ExplainTask#getBasictypeKeyedMap}} which sort works by the work#name in my 
previous question. Although it causes Map10 is in front of Map6 in above 
example in explain, Map10 and Map6 are executed parallel in the runtime.

>  Explain outputs: map-entry ordering of non-primitive objects. 
> ---------------------------------------------------------------
>
>                 Key: HIVE-14285
>                 URL: https://issues.apache.org/jira/browse/HIVE-14285
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Minor
>             Fix For: 2.3.0
>
>         Attachments: HIVE-14285.1.patch
>
>
> In HIVE-12244 I've left behind some ugly backward compatible getters with 
> {{@Explain}} decorations to keep the qtests from breaking.
> There were heavy explain plan changes when I used {{Path}} objects as keys in 
> {{@Explain}} marked methods.
> I've looked into the causes of this:
>  * there is a {{TreeSet}} in there to keep all the keys in order.
>  * but: {{org.apache.hadoop.fs.Path}} uses a different sort order (inherited 
> from {{java.net.URI}} )...it sorts the paths using 
> priorities:[schema,schemeSpecificPart,host,path,query,fragment]
>   considering that the output is an explain result(possibly read by humans): 
> i don't think this sophisticated sort order can be useful.
> {{ExplainTask#outputMap}} always calls toString() on the keys before using 
> them...so the most painless solution would be to change all the keys inside 
> the treeset to simple strings (in case it's not a primitive already); this 
> would restore the original behaviour for me.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-14285) Explain outputs: map-entry ordering of non-primitive objects.

Reply via email to