Re: [PR] HIVE-28855: VectorGroupByOperator should use getGroupByMemoryUsage() instead of getMemoryThreshold(). [hive]

via GitHub Tue, 08 Apr 2025 07:40:09 -0700


okumin commented on PR #5719:
URL: https://github.com/apache/hive/pull/5719#issuecomment-2786679837


   I also double-checked the content in the PR description. As you say, 
GroupByOperator and VectorGroupByOperator have some inconsistencies.
   - The source of the maximum memory for the operators; the non-vectorized 
version uses getMaxMemoryAvailable both for LLAP and Tez, and the vectorized 
version uses getMaxMemoryAvailable for LLAP and max JVM heap for Tez
   - `hive.map.aggr.hash.force.flush.memory.threshold` vs 
`hive.map.aggr.hash.percentmemory` for hash table memory(numEntries * width)
   - The metrics of total memory pressure; the non-vectorized version uses used 
JVM heap and the vectorized version uses a SoftReference
   - Only the vectorized version has a threshold for simple count
   
   This PR addresses the second problem. I think we can also standardize the 
first one. I have no idea about the third and fourth ones, as they seem to be 
intentional.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Re: [PR] HIVE-28855: VectorGroupByOperator should use getGroupByMemoryUsage() instead of getMemoryThreshold(). [hive]

Reply via email to