Padma Penumarthy created DRILL-6071:
---------------------------------------

             Summary: Limit batch size for flatten operator
                 Key: DRILL-6071
                 URL: https://issues.apache.org/jira/browse/DRILL-6071
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Flow
    Affects Versions: 1.12.0
            Reporter: Padma Penumarthy
            Assignee: Padma Penumarthy
             Fix For: 1.13.0


flatten currently uses an adaptive algorithm to control the outgoing batch 
size. 
While processing the input batch, it adjusts the number of records in outgoing 
batch based on memory usage so far. Once memory usage exceeds the configured 
limit, the algorithm becomes more proactive and adjusts the limit half way 
through  and end of every batch. All this periodic checking of memory usage is 
unnecessary overhead and impacts performance. Also, we will know only after the 
fact. 

Instead, figure out how many rows should be there in the outgoing batch from 
incoming batch.
The way to do that would be to figure out average row size of the outgoing 
batch and based on that figure out how many rows can be there for a given 
amount of memory. value vectors provide us the necessary information to be able 
to figure this out.








--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to