[ 
https://issues.apache.org/jira/browse/HIVE-28855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28855:
----------------------------------
    Labels: pull-request-available  (was: )

> VectorGroupByOperator computes maxHashTblMemory using incorrect configuration.
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-28855
>                 URL: https://issues.apache.org/jira/browse/HIVE-28855
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Seonggon Namgung
>            Assignee: Seonggon Namgung
>            Priority: Minor
>              Labels: pull-request-available
>
> The VectorGroupByOperator computes maxHashTblMemory using an incorrect 
> configuration parameter, leading to inconsistencies in HashTable memory 
> management between GroupByOperator and VectorGroupByOperator.
> There are two configuration properties relevant to GroupBy HashTable memory 
> management in Hive: hive.map.aggr.hash.force.flush.memory.threshold and 
> hive.map.aggr.hash.percentmemory.
> GroupByOperator flushes the HashTable when:
> (1) The total memory usage exceeds the force flush threshold 
> (hive.map.aggr.hash.force.flush.memory.threshold).
> (2) The estimated size of the hash table exceeds maxHashTblMemory.
> VectorGroupByOperator follows a slightly different approach. It flushes the 
> HashTable when
> (1) GC happened after the last flush.
> (2) The estimated size of the hash table exceeds maxHashTblMemory.
> The problem is that GroupByOperator computes maxHashTblMemory using 
> hive.map.aggr.hash.percentmemory, while VectorGroupByOperator computes it 
> using hive.map.aggr.hash.force.flush.memory.threshold. As GroupByOperator 
> does, VectorGroupByOperator should use hive.map.aggr.hash.percentmemory when 
> computing maxHashTblMemory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to