[ https://issues.apache.org/jira/browse/HIVE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293368#comment-14293368 ]
Lefty Leverenz commented on HIVE-7158: -------------------------------------- Doc done, removing TODOC14 label. * [hive.exec.reducers.bytes.per.reducer | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.reducers.bytes.per.reducer] * [hive.exec.reducers.max | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.reducers.max] * [Configuration Properties -- Tez | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez] ** [hive.tez.auto.reducer.parallelism | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.tez.auto.reducer.parallelism] ** [hive.tez.max.partition.factor | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.tez.max.partition.factor] ** [hive.tez.min.partition.factor | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.tez.min.partition.factor] ([~gopalv]: The Tez section in "Configuration Properties" has a link to http://tez.apache.org, and the "Hive on Tez" wikidoc has two links.) > Use Tez auto-parallelism in Hive > -------------------------------- > > Key: HIVE-7158 > URL: https://issues.apache.org/jira/browse/HIVE-7158 > Project: Hive > Issue Type: New Feature > Reporter: Gunther Hagleitner > Assignee: Gunther Hagleitner > Fix For: 0.14.0 > > Attachments: HIVE-7158.1.patch, HIVE-7158.2.patch, HIVE-7158.3.patch, > HIVE-7158.4.patch, HIVE-7158.5.patch > > > Tez can optionally sample data from a fraction of the tasks of a vertex and > use that information to choose the number of downstream tasks for any given > scatter gather edge. > Hive estimates the count of reducers by looking at stats and estimates for > each operator in the operator pipeline leading up to the reducer. However, if > this estimate turns out to be too large, Tez can reign in the resources used > to compute the reducer. > It does so by combining partitions of the upstream vertex. It cannot, > however, add reducers at this stage. > I'm proposing to let users specify whether they want to use auto-parallelism > or not. If they do there will be scaling factors to determine max and min > reducers Tez can choose from. We will then partition by max reducers, letting > Tez sample and reign in the count up until the specified min. -- This message was sent by Atlassian JIRA (v6.3.4#6332)