paul-rogers opened a new pull request #1429: DRILL-6676: Add Union, List and 
Repeated List types to Result Set Loader
URL: https://github.com/apache/drill/pull/1429
 
 
   Previous commits provided the core "result set loader" (RSL) structure and 
support for the "mainstream" vector types, including structured types such as 
maps and lists.
   
   This PR adds the "obscure" (and partly implemented) types used for JSON: 
(non-repeated) list, repeated list and union.
   
   The union type is complex: it is a bundle of vectors keyed by type, and can 
accept new types as a run proceeds. A (non-repeated) list is highly complex: it 
it can act like a repeated list, but with the ability to specify a null state 
for each entry. The non-repeated List can also act like a union type. This 
dual/morphing nature of a list required some rather complex magic behind the 
scenes to support the simple JSON-like interface used by the row set and result 
set loader mechanisms.
   
   This PR introduces the idea of a "variant" to model unions and 
non-repeated-lists-as-list-of-unions. The name is taken from Microsoft Basic 
and simply means a tagged union. (Where "union" is taken from "C".)
   
   Changes include fixing a number of issues with the list vectors, adding 
support in the column accessors and metadata layers, and adding support for 
creating vectors from metadata and metadata from vectors.
   
   Unit tests demonstrate how to use the resulting behavior as well as 
verifying that the behavior is correct.
   
   The focus of this PR is to enable union, list and repeated list support in 
the RSL and associated mechanisms. It is known that support of these vector 
types is incomplete: some operators fail when presented with such vectors. It 
is not the goal here to fix those issues: this is not a PR to fully support 
these types. Rather, the the scope of this PR is just to the RSL and associated 
classes.
   
   For more information, see [this wiki 
entry](https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades).
   
   This PR completes the result set loader work. The next PR in this series 
will introduce revisions to the scan operator that allow readers to use the 
RSL. After that, there are revised implementations for the delimited text (e.g. 
CSV) and JSON readers.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to