Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1057
  
    To answer the two questions:
    
    1. The copier is used in multiple locations, some of which include 
selection vectors. Sort uses a copier to merge rows coming from multiple sorted 
batches. The SVR compresses out SVs. A filter will produce an SV2 which the SVR 
removes. An in-memory sort produces an SV4. But, because of the ways plans are 
generated, the hash join will never see a batch with an SV. (An SVR will be 
inserted, if needed, to remove the SV.)
    
    2. We never write a batch using an SV. The SV is always a source 
indirection. Because we do indirection on the source side (and vectors are 
append only), there can be no SV on the destination side.
    
    Note also that the {{VectorContainer}} class, despite it's API, knows 
nothing about SVs. The SV is tacked on separately by the {{RecordBatch}}. (This 
is a less-than-ideal design, but it is how things work at present.) FWIW, the 
test-oriented {{RowSet}} abstractions came about as wrappers around both the 
{{VectorContainer}} and SV to provide a unified view.
    
    Because of how we do SVs, you'll need three copy methods: one for no SV, 
one for an SV2 and another for an SV4.
    
    In the fullness of time, the new "column reader" and "column writer" 
abstractions will hide all this stuff, but it will take time before those tools 
come online.


---

Reply via email to