[ 
https://issues.apache.org/jira/browse/HIVE-23843?focusedWorklogId=461478&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461478
 ]

ASF GitHub Bot logged work on HIVE-23843:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Jul/20 09:05
            Start Date: 21/Jul/20 09:05
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on a change in pull request #1250:
URL: https://github.com/apache/hive/pull/1250#discussion_r457944846



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##########
@@ -561,17 +590,25 @@ private void flush(boolean all) throws HiveException {
             maxHashTblMemory/1024/1024,
             gcCanary.get() == null ? "dead" : "alive"));
       }
+      int avgAccess = computeAvgAccess();
 
       /* Iterate the global (keywrapper,aggregationbuffers) map and emit
        a row for each key */
       Iterator<Map.Entry<KeyWrapper, VectorAggregationBufferRow>> iter =
           mapKeysAggregationBuffers.entrySet().iterator();
       while(iter.hasNext()) {
         Map.Entry<KeyWrapper, VectorAggregationBufferRow> pair = iter.next();
+        if (!all && avgAccess >= 1) {
+          // Retain entries when access pattern is > than average access
+          if (pair.getValue().getAccessCount() > avgAccess) {

Review comment:
       the new patch flattens an LRU logic into the operator itself - I rather 
liked the earlier pluggable approach better; since that could enable us to 
later plug-in LFRU which might be the best for this job...
   
   I believe that is because we might not want to plug in different kind of 
algos here...which is fine
   however I think we can shift toward an LFRU alike operation by penalitizing 
entries which are kept:
   * reset accesscount of kept entries to 0
   * or...by multiply it by .5 or something...(can be placed in conf)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 461478)
    Time Spent: 1.5h  (was: 1h 20m)

> Improve key evictions in VectorGroupByOperator
> ----------------------------------------------
>
>                 Key: HIVE-23843
>                 URL: https://issues.apache.org/jira/browse/HIVE-23843
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Keys in {{mapKeysAggregationBuffers}} are evicted in random order. Tasks also 
> get into GC issues when multiple keys are involved in groupbys. It would be 
> good to provide an option to have LRU based eviction for 
> mapKeysAggregationBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to