[ https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pritesh Maker updated DRILL-6071: --------------------------------- Issue Type: Improvement (was: Bug) > Limit batch size for flatten operator > ------------------------------------- > > Key: DRILL-6071 > URL: https://issues.apache.org/jira/browse/DRILL-6071 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow > Affects Versions: 1.12.0 > Reporter: Padma Penumarthy > Assignee: Padma Penumarthy > Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > flatten currently uses an adaptive algorithm to control the outgoing batch > size. > While processing the input batch, it adjusts the number of records in > outgoing batch based on memory usage so far. Once memory usage exceeds the > configured limit for a batch, the algorithm becomes more proactive and > adjusts the limit half way through and end of every batch. All this periodic > checking of memory usage is unnecessary overhead and impacts performance. > Also, we will know only after the fact. > Instead, figure out how many rows should be there in the outgoing batch from > incoming batch. > The way to do that would be to figure out average row size of the outgoing > batch and based on that figure out how many rows can be there for a given > amount of memory. value vectors provide us the necessary information to be > able to figure this out. > Row count in output batch should be decided based on memory (with min 1 and > max 64k rows) and not hard coded (to 4K) in code. Memory for output batch > should be configurable system option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)