[
https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on HIVE-17935 started by Andrew Sherman.
---------------------------------------------
> Turn on hive.optimize.sort.dynamic.partition by default
> -------------------------------------------------------
>
> Key: HIVE-17935
> URL: https://issues.apache.org/jira/browse/HIVE-17935
> Project: Hive
> Issue Type: Bug
> Reporter: Andrew Sherman
> Assignee: Andrew Sherman
> Attachments: HIVE-17935.1.patch
>
>
> The config option hive.optimize.sort.dynamic.partition is an optimization for
> Hive’s dynamic partitioning feature. It was originally implemented in
> [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this
> optimization, the dynamic partition columns and bucketing columns (in case of
> bucketed tables) are sorted before being fed to the reducers. Since the
> partitioning and bucketing columns are sorted, each reducer can keep only one
> record writer open at any time thereby reducing the memory pressure on the
> reducers. There were some early problems with this optimization and it was
> disabled by default in HiveConf in
> [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then
> setting hive.optimize.sort.dynamic.partition=true has been used to solve
> problems where dynamic partitioning produces with (1) too many small files on
> HDFS, which is bad for the cluster and can increase overhead for future Hive
> queries over those partitions, and (2) OOM issues in the map tasks because it
> trying to simultaneously write to 100 different files.
> It now seems that the feature is probably mature enough that it can be
> enabled by default.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)