[
https://issues.apache.org/jira/browse/DRILL-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575586#comment-16575586
]
ASF GitHub Bot commented on DRILL-6676:
---------------------------------------
paul-rogers opened a new pull request #1429: DRILL-6676: Add Union, List and
Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429
Previous commits provided the core "result set loader" (RSL) structure and
support for the "mainstream" vector types, including structured types such as
maps and lists.
This PR adds the "obscure" (and partly implemented) types used for JSON:
(non-repeated) list, repeated list and union.
The union type is complex: it is a bundle of vectors keyed by type, and can
accept new types as a run proceeds. A (non-repeated) list is highly complex: it
it can act like a repeated list, but with the ability to specify a null state
for each entry. The non-repeated List can also act like a union type. This
dual/morphing nature of a list required some rather complex magic behind the
scenes to support the simple JSON-like interface used by the row set and result
set loader mechanisms.
This PR introduces the idea of a "variant" to model unions and
non-repeated-lists-as-list-of-unions. The name is taken from Microsoft Basic
and simply means a tagged union. (Where "union" is taken from "C".)
Changes include fixing a number of issues with the list vectors, adding
support in the column accessors and metadata layers, and adding support for
creating vectors from metadata and metadata from vectors.
Unit tests demonstrate how to use the resulting behavior as well as
verifying that the behavior is correct.
The focus of this PR is to enable union, list and repeated list support in
the RSL and associated mechanisms. It is known that support of these vector
types is incomplete: some operators fail when presented with such vectors. It
is not the goal here to fix those issues: this is not a PR to fully support
these types. Rather, the the scope of this PR is just to the RSL and associated
classes.
For more information, see [this wiki
entry](https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades).
This PR completes the result set loader work. The next PR in this series
will introduce revisions to the scan operator that allow readers to use the
RSL. After that, there are revised implementations for the delimited text (e.g.
CSV) and JSON readers.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Add Union, List and Repeated List types to Result Set Loader
> ------------------------------------------------------------
>
> Key: DRILL-6676
> URL: https://issues.apache.org/jira/browse/DRILL-6676
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.15.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
> Fix For: 1.15.0
>
>
> Add support for the "obscure" vector types to the {{ResultSetLoader}}:
> * Union
> * List
> * Repeated List
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)