[jira] [Updated] (HIVE-7685) Parquet memory manager

Dong Chen (JIRA) Mon, 29 Dec 2014 18:55:32 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dong Chen updated HIVE-7685:
----------------------------
    Attachment: HIVE-7685.1.patch

Hi [~brocknoland], 

After PARQUET-108 resolved, I think this attached patch {{HIVE-7685.1.patch}} 
should be ok for Hive to use Parquet memory manager. Could you please help to 
review it?

This patch add one parameter in HiveConf, and its name does not start with 
'hive.', since it is actually defined in Parquet project.

> Parquet memory manager
> ----------------------
>
>                 Key: HIVE-7685
>                 URL: https://issues.apache.org/jira/browse/HIVE-7685
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>            Reporter: Brock Noland
>            Assignee: Dong Chen
>         Attachments: HIVE-7685.1.patch, HIVE-7685.1.patch.ready, 
> HIVE-7685.patch, HIVE-7685.patch.ready
>
>
> Similar to HIVE-4248, Parquet tries to write large very large "row groups". 
> This causes Hive to run out of memory during dynamic partitions when a 
> reducer may have many Parquet files open at a given time.
> As such, we should implement a memory manager which ensures that we don't run 
> out of memory due to writing too many row groups within a single JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7685) Parquet memory manager

Reply via email to