[jira] [Updated] (HIVE-20153) Count and Sum UDF consume more memory in Hive 2+

Aihua Xu (JIRA) Thu, 26 Jul 2018 13:13:08 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-20153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Aihua Xu updated HIVE-20153:
----------------------------
    Status: Patch Available  (was: Open)

patch-1: change uniqueObjects initialization only for windowing distinct mode. 
That will reduce the memory consumption for other cases.

[~szehon] Will you be able to test out the patch to see if it would resolve the 
issue you encountered?

> Count and Sum UDF consume more memory in Hive 2+
> ------------------------------------------------
>
>                 Key: HIVE-20153
>                 URL: https://issues.apache.org/jira/browse/HIVE-20153
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 2.3.2
>            Reporter: Szehon Ho
>            Assignee: Aihua Xu
>            Priority: Major
>         Attachments: HIVE-20153.1.patch, Screen Shot 2018-07-12 at 6.41.28 
> PM.png
>
>
> While playing with Hive2, we noticed that queries with a lot of count() and 
> sum() aggregations run out of memory on Hadoop side where they worked before 
> in Hive1. 
> In many queries, we have to double the Mapper Memory settings (in our 
> particular case mapreduce.map.java.opts from -Xmx2000M to -Xmx4000M), it 
> makes it not so easy to upgrade to Hive 2.
> Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' 
> in GeneraicUDAFSum and GenericUDAFCount, which was added to support Window 
> functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20153) Count and Sum UDF consume more memory in Hive 2+

Reply via email to