[ 
https://issues.apache.org/jira/browse/HIVE-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627405#comment-14627405
 ] 

Chengxiang Li commented on HIVE-11082:
--------------------------------------

It's easy to ignore the alias name during the comparison, what stop me doing 
that is the execution logic afterword. The following operators distinguish 
different inputs by the alias name, as there different table logically, we 
would lose the alias information if combine the MapWorks.
One possible optimization is cut the ReduceSinkOperator into a separate 
MapWork, so that we could cache the previous MapWork which include the operator 
chain before ReduceSinkOperator. This optimization require Hive on Spark 
support appendable MapWork, like MapWork --> MapWork --> ReuceWork, or MapWork 
--> ReduceWork --> MapWork. 

> Support multi edge between nodes in SparkPlan[Spark Branch]
> -----------------------------------------------------------
>
>                 Key: HIVE-11082
>                 URL: https://issues.apache.org/jira/browse/HIVE-11082
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>         Attachments: HIVE-11082.1-spark.patch
>
>
> For Dynamic RDD caching optimization, we found SparkPlan::connect throw 
> exception while we try to combine 2 works with same child, support multi edge 
> between nodes in SparkPlan would help to enable dynamic RDD caching in more 
> use cases, like self join and self union.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to