[jira] [Comment Edited] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

liyunzhang_intel (JIRA) Tue, 20 Jun 2017 19:10:13 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056837#comment-16056837
 ]


liyunzhang_intel edited comment on HIVE-11297 at 6/21/17 2:09 AM:
------------------------------------------------------------------

[~csun]:   I patch HIVE-11297.6.patch on latest master branch(8c5f55e) and run 
query i posted above, i print the operator tree  

SplitOpTreeForDPP#process
{code}
.....
/** print the operator tree **/
  ArrayList<TableScanOperator> tableScanList = new ArrayList ();
 tableScanList.add((TableScanOperator)stack.get(0));
 LOG.debug("operator tree:"+Operator.toString(tableScanList));
/** print the operator tree**/
Operator<?> filterOp = pruningSinkOp;
    while (filterOp != null) {
      if (filterOp.getNumChild() > 1) {
        break;
      } else {
        filterOp = filterOp.getParentOperators().get(0);
      }
    }
....

{code}

the operator tree is:
{code}
TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{code}

So can you retest it in your env? if the operator tree is like what you 
mentioned, i think all the operator tree in 
spark_dynamic_partition_pruning.q.out will be different as i generated in my 
env.



was (Author: kellyzly):
[~csun]:   I patch HIVE-11297.6.patch on latest master branch(8c5f55e) and run 
query i posted above, i print the operator tree of filterOp 

SplitOpTreeForDPP#process
{code}
.....
/** print the operator tree **/
  ArrayList<TableScanOperator> tableScanList = new ArrayList ();
 tableScanList.add((TableScanOperator)stack.get(0));
 LOG.debug("operator tree:"+Operator.toString(tableScanList));
/** print the operator tree**/
Operator<?> filterOp = pruningSinkOp;
    while (filterOp != null) {
      if (filterOp.getNumChild() > 1) {
        break;
      } else {
        filterOp = filterOp.getParentOperators().get(0);
      }
    }
....

{code}

the operator tree is:
{code}
TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{code}


> Combine op trees for partition info generating tasks [Spark branch]
> -------------------------------------------------------------------
>
>                 Key: HIVE-11297
>                 URL: https://issues.apache.org/jira/browse/HIVE-11297
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: spark-branch
>            Reporter: Chao Sun
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, HIVE-11297.6.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

Reply via email to