[
https://issues.apache.org/jira/browse/DRILL-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517225#comment-16517225
]
ASF GitHub Bot commented on DRILL-6310:
---------------------------------------
ilooner commented on issue #1324: DRILL-6310: limit batch size for hash
aggregate
URL: https://github.com/apache/drill/pull/1324#issuecomment-398439832
I'm okay with delaying changes to updateEstMaxBatchSize as long as all the
QA tests pass.
I think there are two problems at play here.
1. BatchHolders default to 64k rows. This change addresses this issue, my
only comment is so not change the batch holder size dynamically.
2. Limit output batch size.
I think limiting output batch size should be treated independently from the
BatchHolder size. When we output batches we must do some additional math to
pick the correct number of rows from the BatchHolders. This is because the
BatchHolders contains a variety of different data that our statistics did not
accurately capture at the time we picked the BatchHolder size. I agree that the
solution for problem (2) should be dynamic, and in fact we can have an accurate
solution for it since we know the BatchHolders we are getting the records from
to create output batches.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> limit batch size for hash aggregate
> -----------------------------------
>
> Key: DRILL-6310
> URL: https://issues.apache.org/jira/browse/DRILL-6310
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Flow
> Affects Versions: 1.13.0
> Reporter: Padma Penumarthy
> Assignee: Padma Penumarthy
> Priority: Major
> Fix For: 1.14.0
>
>
> limit batch size for hash aggregate based on memory.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)