Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/1057
To answer the two questions:
1. The copier is used in multiple locations, some of which include
selection vectors. Sort uses a copier to merge rows coming from multiple sorted
batches. The SVR compresses out SVs. A filter will produce an SV2 which the SVR
removes. An in-memory sort produces an SV4. But, because of the ways plans are
generated, the hash join will never see a batch with an SV. (An SVR will be
inserted, if needed, to remove the SV.)
2. We never write a batch using an SV. The SV is always a source
indirection. Because we do indirection on the source side (and vectors are
append only), there can be no SV on the destination side.
Note also that the {{VectorContainer}} class, despite it's API, knows
nothing about SVs. The SV is tacked on separately by the {{RecordBatch}}. (This
is a less-than-ideal design, but it is how things work at present.) FWIW, the
test-oriented {{RowSet}} abstractions came about as wrappers around both the
{{VectorContainer}} and SV to provide a unified view.
Because of how we do SVs, you'll need three copy methods: one for no SV,
one for an SV2 and another for an SV4.
In the fullness of time, the new "column reader" and "column writer"
abstractions will hide all this stuff, but it will take time before those tools
come online.
---