[ 
https://issues.apache.org/jira/browse/DRILL-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575578#comment-16575578
 ] 

Paul Rogers commented on DRILL-6678:
------------------------------------

The SVR copies data. The size of the output batch can be no larger than the 
input. In the worst case, output will be a single row.

Is the proposal here to consolidate multiple incoming batches into a single 
output batch to preserve an ideal batch size? If so, that changes the semantics 
of the operator somewhat.

Is the idea to do all or nothing for each incoming batch? Either append it all 
to the output batch, or send off the current output and start anew?

Or, is the idea to append rows from multiple incoming batches until the output 
batch reaches the target size? That is, if A, B and C are incoming batches, the 
output batch may have all selected rows from A and B, and, say, have the 
selected rows from C.

If the goal is to consolidate, then you can get a rough cut using batch sizing 
(the "sizer.") But, the description mentions "maximum utilization." The best 
way to achieve actual maximum utilization (rather than approximate) is to use 
the Result Set Loader: it's whole purpose is to pack rows into a batch until it 
just meets the target output size. Using that might save having to reinvent 
some of the same wheels.


> Improve SelectionVectorRemover to pack output batch based on BatchSizing
> ------------------------------------------------------------------------
>
>                 Key: DRILL-6678
>                 URL: https://issues.apache.org/jira/browse/DRILL-6678
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.14.0
>            Reporter: Sorabh Hamirwasia
>            Assignee: Sorabh Hamirwasia
>            Priority: Major
>
> SelectionVectorRemover in most of the cases is downstream to Filter which 
> reduces the number of records to be copied in output container. In those 
> cases if SelectionVectorRemover can pack the output batch to maximum 
> utilization that will reduce the number of output batches from it and will 
> help to improve performance. During Lateral & Unnest  Performance evaluation 
> we have noticed a significant decrease in performance as number of batches 
> increases for same number of rows (i.e. Batch is not fully packed)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to