[jira] [Commented] (HIVE-15357) Fix and re-enable the spark-only tests

Rui Li (JIRA) Tue, 13 Dec 2016 05:03:33 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745110#comment-15745110
 ]


Rui Li commented on HIVE-15357:
-------------------------------

When I tried to re-enable these tests, I found this issue with DPP.
Running the following query in our qtest:
{code}
EXPLAIN select count(*) from srcpart join (select ds as ds, ds as `date` from 
srcpart group by ds) s on (srcpart.ds = s.ds) where s.`date` = '2008-04-08';
{code}
will get this following plan:
{noformat}
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
    Spark
      Edges:
        Reducer 7 <- Map 6 (GROUP, 1)
      DagName: lirui_20161213205144_16c810ef-52f6-40c9-a346-7dbd2c9ef99e:2
      Vertices:
        Map 6 
            Map Operator Tree:
                TableScan
                  alias: srcpart
                  Statistics: Num rows: 1000 Data size: 10624 Basic stats: 
COMPLETE Column stats: NONE
                  Select Operator
                    Statistics: Num rows: 1000 Data size: 10624 Basic stats: 
COMPLETE Column stats: NONE
                    Group By Operator
                      keys: '2008-04-08' (type: string)
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 1000 Data size: 10624 Basic stats: 
COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: '2008-04-08' (type: string)
                        sort order: +
                        Map-reduce partition columns: '2008-04-08' (type: 
string)
                        Statistics: Num rows: 1000 Data size: 10624 Basic 
stats: COMPLETE Column stats: NONE
        Map 8 
            Map Operator Tree:
                TableScan
                  alias: srcpart
                  Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
COMPLETE Column stats: NONE
                  Select Operator
                    expressions: ds (type: string)
                    outputColumnNames: _col0
                    Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
COMPLETE Column stats: NONE
                    Group By Operator
                      keys: _col0 (type: string)
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
COMPLETE Column stats: NONE
                      Spark Partition Pruning Sink Operator
                        partition key expr: ds
                        Statistics: Num rows: 2000 Data size: 21248 Basic 
stats: COMPLETE Column stats: NONE
                        target column name: ds
                        target work: Map 1
        Reducer 7 
            Reduce Operator Tree:
              Group By Operator
                keys: '2008-04-08' (type: string)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
Column stats: NONE
                Select Operator
                  Statistics: Num rows: 500 Data size: 5312 Basic stats: 
COMPLETE Column stats: NONE
                  Select Operator
                    expressions: '2008-04-08' (type: string)
                    outputColumnNames: _col0
                    Statistics: Num rows: 500 Data size: 5312 Basic stats: 
COMPLETE Column stats: NONE
                    Group By Operator
                      keys: _col0 (type: string)
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 500 Data size: 5312 Basic stats: 
COMPLETE Column stats: NONE
                      Spark Partition Pruning Sink Operator
                        partition key expr: ds
                        Statistics: Num rows: 500 Data size: 5312 Basic stats: 
COMPLETE Column stats: NONE
                        target column name: ds
                        target work: Map 5

  Stage: Stage-1
    Spark
      Edges:
        Reducer 2 <- Map 1 (GROUP, 1)
        Reducer 3 <- Map 5 (PARTITION-LEVEL SORT, 1), Reducer 2 
(PARTITION-LEVEL SORT, 1)
        Reducer 4 <- Reducer 3 (GROUP, 1)
      DagName: lirui_20161213205144_16c810ef-52f6-40c9-a346-7dbd2c9ef99e:1
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: srcpart
                  Statistics: Num rows: 1000 Data size: 10624 Basic stats: 
COMPLETE Column stats: NONE
                  Select Operator
                    Statistics: Num rows: 1000 Data size: 10624 Basic stats: 
COMPLETE Column stats: NONE
                    Group By Operator
                      keys: '2008-04-08' (type: string)
                      mode: hash
                      outputColumnNames: _col0
                      Statistics: Num rows: 1000 Data size: 10624 Basic stats: 
COMPLETE Column stats: NONE
                      Reduce Output Operator
                        key expressions: '2008-04-08' (type: string)
                        sort order: +
                        Map-reduce partition columns: '2008-04-08' (type: 
string)
                        Statistics: Num rows: 1000 Data size: 10624 Basic 
stats: COMPLETE Column stats: NONE
        Map 5 
            Map Operator Tree:
                TableScan
                  alias: srcpart
                  Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: ds (type: string)
                    sort order: +
                    Map-reduce partition columns: ds (type: string)
                    Statistics: Num rows: 2000 Data size: 21248 Basic stats: 
COMPLETE Column stats: NONE
        Reducer 2 
            Reduce Operator Tree:
              Group By Operator
                keys: '2008-04-08' (type: string)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
Column stats: NONE
                Select Operator
                  Statistics: Num rows: 500 Data size: 5312 Basic stats: 
COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: '2008-04-08' (type: string)
                    sort order: +
                    Map-reduce partition columns: '2008-04-08' (type: string)
                    Statistics: Num rows: 500 Data size: 5312 Basic stats: 
COMPLETE Column stats: NONE
        Reducer 3 
            Reduce Operator Tree:
              Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 ds (type: string)
                  1 _col0 (type: string)
                Statistics: Num rows: 2200 Data size: 23372 Basic stats: 
COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: count()
                  mode: hash
                  outputColumnNames: _col0
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
                  Reduce Output Operator
                    sort order: 
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
                    value expressions: _col0 (type: bigint)
        Reducer 4 
            Reduce Operator Tree:
              Group By Operator
                aggregations: count(VALUE._col0)
                mode: mergepartial
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
                File Output Operator
                  compressed: false
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: NONE
                  table:
                      input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                      output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                      serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{noformat}
There's an extra MapWork8 in stage-2. The reason is because the joining key is 
partition key for both tables. So we created two DPP operators for the query. 
Tez doesn't have this problem because it has 
{{TezCompiler::runCycleAnalysisForPartitionPruning}} to remove such cyclic DPP 
dependency. I think we need to to the same, but not sure why we didn't see the 
issue when we implemented the feature. I'll find more about it.
[~csun], [~xuefuz] any ideas?

> Fix and re-enable the spark-only tests
> --------------------------------------
>
>                 Key: HIVE-15357
>                 URL: https://issues.apache.org/jira/browse/HIVE-15357
>             Project: Hive
>          Issue Type: Test
>            Reporter: Rui Li
>            Assignee: Rui Li
>
> Defined by {{spark.only.query.files}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15357) Fix and re-enable the spark-only tests

Reply via email to