Seonggon Namgung created HIVE-28855:
---------------------------------------
Summary: VectorGroupByOperator computes maxHashTblMemory using
incorrect configuration.
Key: HIVE-28855
URL: https://issues.apache.org/jira/browse/HIVE-28855
Project: Hive
Issue Type: Bug
Reporter: Seonggon Namgung
Assignee: Seonggon Namgung
The VectorGroupByOperator computes maxHashTblMemory using an incorrect
configuration parameter, leading to inconsistencies in HashTable memory
management between GroupByOperator and VectorGroupByOperator.
There are two configuration properties relevant to GroupBy HashTable memory
management in Hive: hive.map.aggr.hash.force.flush.memory.threshold and
hive.map.aggr.hash.percentmemory.
GroupByOperator flushes the HashTable when:
(1) The total memory usage exceeds the force flush threshold
(hive.map.aggr.hash.force.flush.memory.threshold).
(2) The estimated size of the hash table exceeds maxHashTblMemory.
VectorGroupByOperator follows a slightly different approach. It flushes the
HashTable when
(1) GC happened after the last flush.
(2) The estimated size of the hash table exceeds maxHashTblMemory.
The problem is that GroupByOperator computes maxHashTblMemory using
hive.map.aggr.hash.percentmemory, while VectorGroupByOperator computes it using
hive.map.aggr.hash.force.flush.memory.threshold. As GroupByOperator does,
VectorGroupByOperator should use hive.map.aggr.hash.percentmemory when
computing maxHashTblMemory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)