[
https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517219#comment-16517219
]
ASF GitHub Bot commented on DRILL-6310:
---------------------------------------
ilooner commented on a change in pull request #1324: DRILL-6310: limit batch
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r196468069
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
##########
@@ -1317,7 +1364,7 @@ private void checkGroupAndAggrValues(int incomingRowIdx)
{
useReservedValuesMemory(); // try to preempt an OOM by using the
reserve
- addBatchHolder(currentPartition); // allocate a new (internal) values
batch
+ addBatchHolder(currentPartition, getBatchSize()); // allocate a new
(internal) values batch
Review comment:
Padma I agree we want to limit the size of output batches, and that reducing
the batch holder size is a great change. Having BatchHolders with 64K always
64k rows is not practical. My issue is with changing the batch holder size
dynamically. I think it adds complexity without a concrete benefit. Since new
data will be added to old BatchHolders data will never really go into a
BatchHolder that was appropriately sized for it. Since we can't really have an
accurate solution by taking the complex approach with dynamically changing
BatchHolder sizes, I think we should go with the simpler approach. We can still
use all your changes, I just think we shouldn't continue updating the
BatchHolder size.
The complexity added is in the added overhead for computing indexes in the
hashtable, and there will be more complexity in doing the refactored memory
calculations I am adding.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> limit batch size for hash aggregate
> -----------------------------------
>
> Key: DRILL-6310
> URL: https://issues.apache.org/jira/browse/DRILL-6310
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Flow
> Affects Versions: 1.13.0
> Reporter: Padma Penumarthy
> Assignee: Padma Penumarthy
> Priority: Major
> Fix For: 1.14.0
>
>
> limit batch size for hash aggregate based on memory.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)