[
https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517107#comment-16517107
]
ASF GitHub Bot commented on DRILL-6310:
---------------------------------------
ppadma commented on a change in pull request #1324: DRILL-6310: limit batch
size for hash aggregate
URL: https://github.com/apache/drill/pull/1324#discussion_r196434655
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
##########
@@ -1317,7 +1364,7 @@ private void checkGroupAndAggrValues(int incomingRowIdx)
{
useReservedValuesMemory(); // try to preempt an OOM by using the
reserve
- addBatchHolder(currentPartition); // allocate a new (internal) values
batch
+ addBatchHolder(currentPartition, getBatchSize()); // allocate a new
(internal) values batch
Review comment:
Adjusting batch holder size here means adjusting number of rows in the
batch, based on average row width. Idea is to limit size in terms of memory,
not in terms of number of rows. Batches are limited to 16MB (or whatever
configured output batch size). By allocating huge batches and partially
transmitting them, we might be able to limit output batch size, but that does
not produce much benefit. We want to avoid huge memory allocations.
Why we should not change batch holder size ?
If we size just based on first batch, it creates the exact problem you
mentioned i.e. they will be sized based on older input data and they may not
make much sense for new data.
What I have is not exact perfect solution. In fact, I don't even know if
such a solution is possible or exists. This will work fine with law of averages.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> limit batch size for hash aggregate
> -----------------------------------
>
> Key: DRILL-6310
> URL: https://issues.apache.org/jira/browse/DRILL-6310
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Flow
> Affects Versions: 1.13.0
> Reporter: Padma Penumarthy
> Assignee: Padma Penumarthy
> Priority: Major
> Fix For: 1.14.0
>
>
> limit batch size for hash aggregate based on memory.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)