[
https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pritesh Maker updated DRILL-6071:
---------------------------------
Issue Type: Improvement (was: Bug)
> Limit batch size for flatten operator
> -------------------------------------
>
> Key: DRILL-6071
> URL: https://issues.apache.org/jira/browse/DRILL-6071
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Flow
> Affects Versions: 1.12.0
> Reporter: Padma Penumarthy
> Assignee: Padma Penumarthy
> Priority: Major
> Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> flatten currently uses an adaptive algorithm to control the outgoing batch
> size.
> While processing the input batch, it adjusts the number of records in
> outgoing batch based on memory usage so far. Once memory usage exceeds the
> configured limit for a batch, the algorithm becomes more proactive and
> adjusts the limit half way through and end of every batch. All this periodic
> checking of memory usage is unnecessary overhead and impacts performance.
> Also, we will know only after the fact.
> Instead, figure out how many rows should be there in the outgoing batch from
> incoming batch.
> The way to do that would be to figure out average row size of the outgoing
> batch and based on that figure out how many rows can be there for a given
> amount of memory. value vectors provide us the necessary information to be
> able to figure this out.
> Row count in output batch should be decided based on memory (with min 1 and
> max 64k rows) and not hard coded (to 4K) in code. Memory for output batch
> should be configurable system option.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)