rbalamohan commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459378925



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
             maxHashTblMemory/1024/1024,
             gcCanary.get() == null ? "dead" : "alive"));
       }
+      int avgAccess = computeAvgAccess();
 
       /* Iterate the global (keywrapper,aggregationbuffers) map and emit
        a row for each key */
       Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter =
           mapKeysAggregationBuffers.entrySet().iterator();
       while(iter.hasNext()) {
         Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next();
+        if (!all && avgAccess >= 1) {
+          // Retain entries when access pattern is > than average access
+          if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
       For rollup, it should be the other way around. If retained entries are 
reset to zero, it could potentially get evicted in next iteration polluting the 
map. I guess the confusion is with strict LRU impl here. Intent is not to 
implement strict LRU, and evict one at time (earlier impl with LinkedHashMap 
followed that, but ends up with L1 cache misses and ends up with heavy object 
tracking). Current implementation is light weight, but does not change the 
pattern of evicting 10% on entries. Just adds logic to retain heavily accessed 
entries. 
   
   Potential further optimization is to reuse the evicted entities and reuse 
them for pooling (e.g for every keywrapper gets cloned internally via copyKey() 
in the map. This causes high mem pressure on certain queries). This can be 
added as an additional optimization in follow up jira.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to