rbalamohan commented on a change in pull request #1250: URL: https://github.com/apache/hive/pull/1250#discussion_r459378925
########## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java ########## @@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException { maxHashTblMemory/1024/1024, gcCanary.get() == null ? "dead" : "alive")); } + int avgAccess = computeAvgAccess(); /* Iterate the global (keywrapper,aggregationbuffers) map and emit a row for each key */ Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter = mapKeysAggregationBuffers.entrySet().iterator(); while(iter.hasNext()) { Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next(); + if (!all && avgAccess >= 1) { + // Retain entries when access pattern is > than average access + if (pair.getValue().getAccessCount() > avgAccess) { Review comment: For rollup, it should be the other way around. If retained entries are reset to zero, it could potentially get evicted in next iteration polluting the map. I guess the confusion is with strict LRU impl here. Intent is not to implement strict LRU, and evict one at time (earlier impl with LinkedHashMap followed that, but ends up with L1 cache misses and ends up with heavy object tracking). Current implementation is light weight, but does not change the pattern of evicting 10% on entries. Just adds logic to retain heavily accessed entries. Potential further optimization is to reuse the evicted entities and reuse them for pooling (e.g for every keywrapper gets cloned internally via copyKey() in the map. This causes high mem pressure on certain queries). This can be added as an additional optimization in follow up jira. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org