Bridget Bevens commented on DRILL-6071:

Added the option to this page: 
 and this page



Removing doc impacting flag. Please reset if there's any issue.


> Limit batch size for flatten operator
> -------------------------------------
>                 Key: DRILL-6071
>                 URL: https://issues.apache.org/jira/browse/DRILL-6071
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.12.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Major
>              Labels: doc-impacting, ready-to-commit
>             Fix For: 1.13.0
> flatten currently uses an adaptive algorithm to control the outgoing batch 
> size. 
>  While processing the input batch, it adjusts the number of records in 
> outgoing batch based on memory usage so far. Once memory usage exceeds the 
> configured limit for a batch, the algorithm becomes more proactive and 
> adjusts the limit half way through and end of every batch. All this periodic 
> checking of memory usage is unnecessary overhead and impacts performance. 
> Also, we will know only after the fact.
> Instead, figure out how many rows should be there in the outgoing batch from 
> incoming batch.
>  The way to do that would be to figure out average row size of the outgoing 
> batch and based on that figure out how many rows can be there for a given 
> amount of memory. value vectors provide us the necessary information to be 
> able to figure this out.
> Row count in output batch should be decided based on memory (with min 1 and 
> max 64k rows) and not hard coded (to 4K) in code. Memory for output batch 
> should be configurable system option.

This message was sent by Atlassian JIRA

Reply via email to