[ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=462405&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-462405
 ]

ASF GitHub Bot logged work on HIVE-23843:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Jul/20 06:42
            Start Date: 23/Jul/20 06:42
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r459243033



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
             maxHashTblMemory/1024/1024,
             gcCanary.get() == null ? "dead" : "alive"));
       }
+      int avgAccess = computeAvgAccess();
 
       /* Iterate the global (keywrapper,aggregationbuffers) map and emit
        a row for each key */
       Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter =
           mapKeysAggregationBuffers.entrySet().iterator();
       while(iter.hasNext()) {
         Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next();
+        if (!all && avgAccess >= 1) {
+          // Retain entries when access pattern is > than average access
+          if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
       @ashutoshc this conversation was still not resolved - I was waiting for 
a response; I think we could have improved further on this patch just by 
changing it a little bit.
   
   @rbalamohan  we are batch removing from the cache elements here; which does 
not happen in regular LRU stuff.
   
   if we have {{K}} cache slots; and start the stream with an element which is 
there for say {{N*K}} times ; that will raise the bar to retain a new cache 
element during flush to {{N}}.
   
   I think the counters of the retained entries should be reset to 0 at least - 
it will increase it's effectiveness and neutralize long-term memory effects - 
like the above




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 462405)
    Time Spent: 1h 50m  (was: 1h 40m)

> Improve key evictions in VectorGroupByOperator
> ----------------------------------------------
>
>                 Key: HIVE-23843
>                 URL: https://issues.apache.org/jira/browse/HIVE-23843
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to