[jira] [Commented] (HIVE-7158) Use Tez auto-parallelism in Hive

Lefty Leverenz (JIRA) Tue, 27 Jan 2015 03:11:58 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293368#comment-14293368
 ]


Lefty Leverenz commented on HIVE-7158:
--------------------------------------

Doc done, removing TODOC14 label.

* [hive.exec.reducers.bytes.per.reducer | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.reducers.bytes.per.reducer]
* [hive.exec.reducers.max | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.reducers.max]
* [Configuration Properties -- Tez | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez]
** [hive.tez.auto.reducer.parallelism | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.tez.auto.reducer.parallelism]
** [hive.tez.max.partition.factor | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.tez.max.partition.factor]
** [hive.tez.min.partition.factor | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.tez.min.partition.factor]

([~gopalv]:  The Tez section in "Configuration Properties" has a link to 
http://tez.apache.org, and the "Hive on Tez" wikidoc has two links.)


> Use Tez auto-parallelism in Hive
> --------------------------------
>
>                 Key: HIVE-7158
>                 URL: https://issues.apache.org/jira/browse/HIVE-7158
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Gunther Hagleitner
>            Assignee: Gunther Hagleitner
>             Fix For: 0.14.0
>
>         Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, 
> HIVE-7158.4.patch, HIVE-7158.5.patch
>
>
> Tez can optionally sample data from a fraction of the tasks of a vertex and 
> use that information to choose the number of downstream tasks for any given 
> scatter gather edge.
> Hive estimates the count of reducers by looking at stats and estimates for 
> each operator in the operator pipeline leading up to the reducer. However, if 
> this estimate turns out to be too large, Tez can reign in the resources used 
> to compute the reducer.
> It does so by combining partitions of the upstream vertex. It cannot, 
> however, add reducers at this stage.
> I'm proposing to let users specify whether they want to use auto-parallelism 
> or not. If they do there will be scaling factors to determine max and min 
> reducers Tez can choose from. We will then partition by max reducers, letting 
> Tez sample and reign in the count up until the specified min.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7158) Use Tez auto-parallelism in Hive

Reply via email to