GitHub user paul-rogers opened a pull request:
https://github.com/apache/drill/pull/1206
DRILL_6314: Add complex types to result set loader
This PR is a bit of a large one as it adds Union, (non repeated) List and
Repeated List type support to the column accessors, row set abstraction, result
set loader abstraction, and associated mechanisms. The good new is that, after
this PR, all the row set and result set loader work will be complete; we'll
then move onto the scan operator and readers.
Both Union and (non-repeated) List have very odd semantics that required
some creative gyrations in the existing code. A (non-repeated) List can old a
single type (List of VarChar) say, in which the list entries can be null to
model a JSON list:
```
{a: ["foo", "bar"]} {a: null}
```
List entries can also be unions (which can include null values.) A List
starts as a simple list (one type), then gets "promoted" to a Union. Much
complexity was needed to hide this process behind the simple row set
abstractions.
There is similarity between List and Union, between List, Repeated List and
"normal" Repeated (array) types. Refactoring reflects these commonalities.
Due to the complexity of the added types, this PR revises the mechanisms
that build a row set from an existing schema. or a schema from a container.
This PR also includes a somewhat orthogonal projection mechanism that
implements projection at the row set mechanism for simple columns, array values
and elements within maps. This code is closely intertwined with schema
creation, and it was not worth the effort to tease the two apart into separate
PRs.
Extensive unit tests show the results in action. These are probably the
best place to start to understand the client view of the new mechanisms.
The work is divided up in a number of commits to help sort out work to each
layer.
The row set mechanism is fully described
[here](https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades).
Rather than write a long description here, please take a look at the code
and the Wiki post. Then, post questions (specific or general) and I'll address
those particular topics which need additional clarification.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/paul-rogers/drill DRILL-6314
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/1206.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1206
----
commit 6bd9ac00fdeb1851799fb38db618f2d951bcd2c3
Author: Paul Rogers <progers@...>
Date: 2018-04-10T21:16:55Z
DRILL-6134: Vector revisions
commit 8bb54971ad331a95b30ebd28fc803b89568f9d8f
Author: Paul Rogers <progers@...>
Date: 2018-04-10T21:18:41Z
DRILL-6314: Vector accessor layer
commit 18b51ba387403abced03070724c72a2c8735901d
Author: Paul Rogers <progers@...>
Date: 2018-04-10T21:21:42Z
DRILL-6314: Row set layer
commit 333ad2c8d77a5c31d5c499da95bbb56a38989099
Author: Paul Rogers <progers@...>
Date: 2018-04-10T21:24:36Z
DRILL-6314: Result set loader layer
commit 54a26828019bbe1103f291b220bc2d20716f9680
Author: Paul Rogers <progers@...>
Date: 2018-04-10T21:25:09Z
DRILL-6314: Metadata layer
commit 655779f558d0a0cc36fd3a0a23a9c305b5adf521
Author: Paul Rogers <progers@...>
Date: 2018-04-10T21:25:29Z
DRILL-6314: Misc revisions
----
---