Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/914
This next commit reintroduces a projection feature. With this change, a
client can:
* Define the set of columns to project
* Define the schema of the data source (table, file, etc.)
* Write columns according to the schema
* Harvest only the projected columns.
### Example
Here is a simple example adapted from `TestResultSetLoaderProjection`.
First, declare the projection
```
List<SchemaPath> selection = Lists.newArrayList(
SchemaPath.getSimplePath("c"),
SchemaPath.getSimplePath("b"),
SchemaPath.getSimplePath("e"));
```
Then, declare the schema. (Here, we declare the schema up-front. Projection
also works if the schema is defined as columns are discovered while creating a
batch.)
```
TupleMetadata schema = new SchemaBuilder()
.add("a", MinorType.INT)
.add("b", MinorType.INT)
.add("c", MinorType.INT)
.add("d", MinorType.INT)
.buildSchema();
```
Then, use the options mechanisms to pass the information to the result set
loader:
```
ResultSetOptions options = new OptionBuilder()
.setProjection(selection)
.setSchema(schema)
.build();
ResultSetLoader rsLoader = new ResultSetLoaderImpl(fixture.allocator(),
options);
```
Now, we can write the four columns in the data source:
```
RowSetLoader rootWriter = rsLoader.writer();
rsLoader.startBatch();
â¦
rootWriter.start();
rootWriter.scalar(âaâ).setInt(10);
rootWriter.scalar(âbâ).setInt(20);
rootWriter.scalar(âcâ).setInt(30);
rootWriter.scalar(âdâ).setInt(40);
rootWriter.save();
```
But, when we harvest the results, we get only the projected columns. Notice
that âeâ is projected, but does not exist in the table, and so is not
projected to the output. A higher level of code will handle this case.
```
#: b, c
0: 20, 30
```
### Maps
Although the above example does not show the feature, the mechanism also
handles maps and arrays of maps. The rules are:
* If the projection list includes specific map members (such as âm.bâ),
then project only those map members.
* If the projection list includes just the map name (such as âmâ), then
project all map members (such as âm.aâ and âm.bâ.)
* If the projection list does not include the map at all, then project
neither the map nor any of its members.
### Implementation
The implementation builds on previous commits. The idea is that we create a
âdummyâ column and writer, but we do not create the underlying value
vector. This allows the client to be blissfully ignorant of whether the column
is projected or not. On the other hand, if the client wants to know if a column
is projected (perhaps to optimize away certain operations), then the projection
status is available in the column metadata.
#### Projection Set
Projection starts with a `ProjectionSet` abstraction. Each tuple (row, map)
has a projection set. The projection set can be a set of names
(`ProjectionSetImpl`) or a default (`NullProjectionSet`).
#### Result Set Loader
The result set loader is modified to check if a column is projected. If so,
the code flow is the same as previously. If not, then the code will create the
dummy vector state and dummy writers described above.
Adding support for non-projected columns involved the usual amount of
refactoring and moving bits of code around to get a simple solution.
#### Accessor Factories
Prior versions had a `ColumnAccessorFactory` class that created both
readers and writers. This commit splits the class into separate reader and
writer factories. The writer factory now creates dummy writers if asked to
create a writer when the backing vector is null. To make this easier, factory
code that previously appeared in each writer has moved into the writer factory.
(Note that readers donât support projection: there is no need.)
#### Dummy Writers
The accessor layer is modified to create a set of dummy writers. Scalar
writers have a wide (internal) interface. Dummy scalar writers simply ignore
the unsupported operations. Dummy array and tuple writers are also provided.
#### Unit Test
The new `TestResultSetLoaderProjection` test exercises the new code. The
new `DummyWriterTest` exercises the dummy writers.
---