[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate

ASF GitHub Bot (JIRA) Tue, 19 Jun 2018 08:18:11 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517219#comment-16517219
 ]


ASF GitHub Bot commented on DRILL-6310:
---------------------------------------

ilooner commented on a change in pull request #1324: DRILL-6310: limit batch 
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r196468069
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ##########
 @@ -1317,7 +1364,7 @@ private void checkGroupAndAggrValues(int incomingRowIdx) 
{
 
         useReservedValuesMemory(); // try to preempt an OOM by using the 
reserve
 
-        addBatchHolder(currentPartition);  // allocate a new (internal) values 
batch
+        addBatchHolder(currentPartition, getBatchSize());  // allocate a new 
(internal) values batch
 
 Review comment:
   Padma I agree we want to limit the size of output batches, and that reducing 
the batch holder size is a great change. Having BatchHolders with 64K always 
64k rows is not practical. My issue is with changing the batch holder size 
dynamically. I think it adds complexity without a concrete benefit. Since new 
data will be added to old BatchHolders data will never really go into a 
BatchHolder that was appropriately sized for it. Since we can't really have an 
accurate solution by taking the complex approach with dynamically changing 
BatchHolder sizes, I think we should go with the simpler approach. We can still 
use all your changes, I just think we shouldn't continue updating the 
BatchHolder size.
   
   The complexity added is in the added overhead for computing indexes in the 
hashtable, and there will be more complexity in doing the refactored memory 
calculations I am adding.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> limit batch size for hash aggregate
> -----------------------------------
>
>                 Key: DRILL-6310
>                 URL: https://issues.apache.org/jira/browse/DRILL-6310
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.13.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Major
>             Fix For: 1.14.0
>
>
> limit batch size for hash aggregate based on memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6310) limit batch size for hash aggregate

Reply via email to