[ 
https://issues.apache.org/jira/browse/DRILL-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579069#comment-16579069
 ] 

Paul Rogers commented on DRILL-6683:
------------------------------------

While this seems a good idea; it does get to the core of the design of the 
{{VectorContainer}} vs. {{RecordBatch}} abstractions.

Despite its name, {{RecordBatch}} is an *operator*, not a batch of data. A 
{{RecordBatch}} (operator) has an associated output batch of data (a record 
batch but not a {{RecordBatch}}) represented by a {{VectorContainer}}. Metadata 
for that container is described by {{BatchSchema}}, which is stored in the 
{{VectorContainer}}. Since a full record batch is defined by a set of vectors 
*and* it associated selection vector, it seems odd to disassociate them.

Rather than remove the methods from {{VectorContainer}}, a better longer-term 
change would be to move the selection vector into the {{VectorContainer}}. 
Today, it is an odd add-on maintained by the operator, (the so-called 
{{RecordBatch}}), not the record batch (the so-called {{VectorContainer}}.)

As you've seen in the {{RowSet}} classes, a {{RowSet}} is the logical 
equivalent of (actually a wrapper for) both a {{VectorContainer}} and a 
selection vector.

Also, the newer stuff to come that builds on the result set loader splits the 
operator interface into three responsibilities:

* Operator
* Outgoing batch
* Iterator protocol driver

In this world, a {{RowSet}} (or the result set loader equivalent for reading) 
would represent the outgoing batch, the operator handle the work of 
transforming batches.

So, long comment, because the design in this area needs work (which this bug 
suggests), but the fixes are subtle.

> move getSelectionVector2 and getSelectionVector4 from VectorAccessible 
> interface to RecordBatch interface
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6683
>                 URL: https://issues.apache.org/jira/browse/DRILL-6683
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Timothy Farkas
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to