[ 
https://issues.apache.org/jira/browse/HIVE-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308995#comment-16308995
 ] 

liyunzhang commented on HIVE-17486:
-----------------------------------

[~stakiar]:
the original purpose to change M->R to M->M->R is to let 
CombineEquivalentWorkResolver combine same Maps. Like
logical plan
{code}
TS[0]-FIL[52]-SEL[2]-GBY[3]-RS[4]-GBY[5]-RS[42]-JOIN[48]-SEL[49]-LIM[50]-FS[51]
TS[1] -FIL[53]-SEL[9]-GBY[10]-RS[11]-GBY[12]-RS[43]-JOIN[48]
{code}  
physical plan
{code}  
Map1:TS[0]
Map2:TS[1]
Map3:FIL[52]-SEL[2]-GBY[3]-RS[4]
Map4:FIL[53]-SEL[9]-GBY[10]-RS[11]
Reducer1:GBY[5]-RS[42]-JOIN[48]-SEL[49]-LIM[50]-FS[51]
Reducer2:GBY[12]-RS[43]
{code}
For {{CombineEquivalentWorkResolver}}, it will combine same Maps. In above 
case, Map2 will be removed because TS\[0\] is same as TS\[1\].  

But when I finish the code, I found that there is no necessary to use this way 
to combine TS\[0\] and TS\[1\]. {{MapInput}} is responsible for TS and I only 
need generate same MapInput for TS\[0\] and TS\[1\]. More detail see 
HIVE-17486.5.patch.


> Enable SharedWorkOptimizer in tez on HOS
> ----------------------------------------
>
>                 Key: HIVE-17486
>                 URL: https://issues.apache.org/jira/browse/HIVE-17486
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang
>            Assignee: liyunzhang
>         Attachments: HIVE-17486.1.patch, HIVE-17486.2.patch, 
> HIVE-17486.3.patch, HIVE-17486.4.patch, explain.28.share.false, 
> explain.28.share.true, scanshare.after.svg, scanshare.before.svg
>
>
> in HIVE-16602, Implement shared scans with Tez.
> Given a query plan, the goal is to identify scans on input tables that can be 
> merged so the data is read only once. Optimization will be carried out at the 
> physical level.  In Hive on Spark, it caches the result of spark work if the 
> spark work is used by more than 1 child spark work. After sharedWorkOptimizer 
> is enabled in physical plan in HoS, the identical table scans are merged to 1 
> table scan. This result of table scan will be used by more 1 child spark 
> work. Thus we need not do the same computation because of cache mechanism.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to