Seonggon Namgung created HIVE-28855:
---------------------------------------

             Summary: VectorGroupByOperator computes maxHashTblMemory using 
incorrect configuration.
                 Key: HIVE-28855
                 URL: https://issues.apache.org/jira/browse/HIVE-28855
             Project: Hive
          Issue Type: Bug
            Reporter: Seonggon Namgung
            Assignee: Seonggon Namgung


The VectorGroupByOperator computes maxHashTblMemory using an incorrect 
configuration parameter, leading to inconsistencies in HashTable memory 
management between GroupByOperator and VectorGroupByOperator.

There are two configuration properties relevant to GroupBy HashTable memory 
management in Hive: hive.map.aggr.hash.force.flush.memory.threshold and 
hive.map.aggr.hash.percentmemory.

GroupByOperator flushes the HashTable when:
(1) The total memory usage exceeds the force flush threshold 
(hive.map.aggr.hash.force.flush.memory.threshold).
(2) The estimated size of the hash table exceeds maxHashTblMemory.

VectorGroupByOperator follows a slightly different approach. It flushes the 
HashTable when
(1) GC happened after the last flush.
(2) The estimated size of the hash table exceeds maxHashTblMemory.

The problem is that GroupByOperator computes maxHashTblMemory using 
hive.map.aggr.hash.percentmemory, while VectorGroupByOperator computes it using 
hive.map.aggr.hash.force.flush.memory.threshold. As GroupByOperator does, 
VectorGroupByOperator should use hive.map.aggr.hash.percentmemory when 
computing maxHashTblMemory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to