[
https://issues.apache.org/jira/browse/HIVE-28855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HIVE-28855:
----------------------------------
Labels: pull-request-available (was: )
> VectorGroupByOperator computes maxHashTblMemory using incorrect configuration.
> ------------------------------------------------------------------------------
>
> Key: HIVE-28855
> URL: https://issues.apache.org/jira/browse/HIVE-28855
> Project: Hive
> Issue Type: Bug
> Reporter: Seonggon Namgung
> Assignee: Seonggon Namgung
> Priority: Minor
> Labels: pull-request-available
>
> The VectorGroupByOperator computes maxHashTblMemory using an incorrect
> configuration parameter, leading to inconsistencies in HashTable memory
> management between GroupByOperator and VectorGroupByOperator.
> There are two configuration properties relevant to GroupBy HashTable memory
> management in Hive: hive.map.aggr.hash.force.flush.memory.threshold and
> hive.map.aggr.hash.percentmemory.
> GroupByOperator flushes the HashTable when:
> (1) The total memory usage exceeds the force flush threshold
> (hive.map.aggr.hash.force.flush.memory.threshold).
> (2) The estimated size of the hash table exceeds maxHashTblMemory.
> VectorGroupByOperator follows a slightly different approach. It flushes the
> HashTable when
> (1) GC happened after the last flush.
> (2) The estimated size of the hash table exceeds maxHashTblMemory.
> The problem is that GroupByOperator computes maxHashTblMemory using
> hive.map.aggr.hash.percentmemory, while VectorGroupByOperator computes it
> using hive.map.aggr.hash.force.flush.memory.threshold. As GroupByOperator
> does, VectorGroupByOperator should use hive.map.aggr.hash.percentmemory when
> computing maxHashTblMemory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)