[jira] [Comment Edited] (HIVE-17486) Enable SharedWorkOptimizer in tez on HOS

liyunzhang (JIRA) Wed, 01 Nov 2017 20:37:13 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235125#comment-16235125
 ]


liyunzhang edited comment on HIVE-17486 at 11/2/17 3:35 AM:
------------------------------------------------------------

[~lirui]:
{quote}
My understanding is HoS also supports one Map connecting to multiple Reducers 
{quote}
There is only 1 RS in Map in HoS. It is true that there are cases that 1 Map is 
used by two Reducers in HoS. But in HoT, 2 RS are allowed in 1 Map, the 
different 2 RS in the 1 Map can transfer different data to 2 different 
Reducers. 
{quote}
The problem here is HoS doesn't merge equivalent works as aggressively as HoT 
does. 
{quote}
yes


was (Author: kellyzly):
[~lirui]:
{quote}
My understanding is HoS also supports one Map connecting to multiple Reducers 
{quote}
There is only 1 RS in Map in HoS. It is true that there are cases that 1 Map is 
used by two Reducers in HoS. But in HoT, 2 RS are allowed in 1 Map, the 
different 2 RS in the 1 Map can transfer different data to 2 different 
Reducers. 

> Enable SharedWorkOptimizer in tez on HOS
> ----------------------------------------
>
>                 Key: HIVE-17486
>                 URL: https://issues.apache.org/jira/browse/HIVE-17486
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang
>            Assignee: liyunzhang
>            Priority: Major
>         Attachments: scanshare.after.svg, scanshare.before.svg
>
>
> in HIVE-16602, Implement shared scans with Tez.
> Given a query plan, the goal is to identify scans on input tables that can be 
> merged so the data is read only once. Optimization will be carried out at the 
> physical level.  In Hive on Spark, it caches the result of spark work if the 
> spark work is used by more than 1 child spark work. After sharedWorkOptimizer 
> is enabled in physical plan in HoS, the identical table scans are merged to 1 
> table scan. This result of table scan will be used by more 1 child spark 
> work. Thus we need not do the same computation because of cache mechanism.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HIVE-17486) Enable SharedWorkOptimizer in tez on HOS

Reply via email to