[ 
https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333867#comment-16333867
 ] 

ASF GitHub Bot commented on DRILL-6071:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1091#discussion_r162846630
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractBase.java
 ---
    @@ -119,6 +119,16 @@ public void setMaxAllocation(long maxAllocation) {
         /*throw new DrillRuntimeException("Unsupported method: 
setMaxAllocation()");*/
       }
     
    +  @Override
    --- End diff --
    
    It the size is a system option, then it need not be passed in each operator 
definition. Note that all operators should use the same batch size, so we'd 
never set this per operator. (The output of one operator is the input to 
another, and the inputs want to be controlled, we can only do that by 
controlling outputs.)


> Limit batch size for flatten operator
> -------------------------------------
>
>                 Key: DRILL-6071
>                 URL: https://issues.apache.org/jira/browse/DRILL-6071
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.12.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Major
>             Fix For: 1.13.0
>
>
> flatten currently uses an adaptive algorithm to control the outgoing batch 
> size. 
> While processing the input batch, it adjusts the number of records in 
> outgoing batch based on memory usage so far. Once memory usage exceeds the 
> configured limit for a batch, the algorithm becomes more proactive and 
> adjusts the limit half way through  and end of every batch. All this periodic 
> checking of memory usage is unnecessary overhead and impacts performance. 
> Also, we will know only after the fact. 
> Instead, figure out how many rows should be there in the outgoing batch from 
> incoming batch.
> The way to do that would be to figure out average row size of the outgoing 
> batch and based on that figure out how many rows can be there for a given 
> amount of memory. value vectors provide us the necessary information to be 
> able to figure this out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to