[
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972136#comment-16972136
]
ASF GitHub Bot commented on DRILL-7359:
---------------------------------------
paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-552769421
The schema builder issue has been nagging at me. I observed above that the
current way of handling nested types is getting too complex, with all the
special cases and "resumeFoo" methods.
The current design was originally designed to build a `BatchSchema` with
`MaterializedField`s,. Those classes require a top-down construction. (That is
also why the final `build()` method returns a `BatchSchema` and why we must
call `buildSchema()` to get a `TupleMetadata`.)
It turns out that the `SchemaMetadata` classes *do not* have the same
constraints. They can actually be built bottom-up. It never occurred to me to
clean up the old-style of building when we repurposed `SchemaBuilder` to build
metadata objects.
So, since we can change the structure, we could borrow a technique from, I
think, certain JSON builders. Rather than having the schema builder create the
nested type builder, ask the user to do so explicitly:
```
TupleMetadata schema = new SchemaBuilder()
.addMap(new MapBuilder("myMap")
.add("name", MinorType.VARCHAR)
.addMap(new MapBuilder("nested")
.add("id", MinorType.INT)
.build())
.build())
.add("last", MinorType.INT)
.build();
```
Adopting this model as additional benefits. It is currently a bit awkward to
create a bare `ColumnMetadata`. We could create helper methods to create each
kind of column schema. Something like:
```
foo = SchemaBuilder.column("foo", MinorType.INT, DataMode.REQUIRED);
bar = SchemaBuilder.dict("bar", MinorType.VARCHAR)
.value(SchemaBuilder.map(...))
.build();
```
Lots of things we can play with.
Given all of this, my advice for the `DICT` type is to figure out something
that works with the structure we already have. Then, as a separate PR, we can
clean up the `SchemaBuilder` to make it easier to use for the many complex
types we now support. That way, you don't have to solve the schema builder
problem in this PR.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Add support for DICT type in RowSet Framework
> ---------------------------------------------
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
> Issue Type: New Feature
> Reporter: Bohdan Kazydub
> Assignee: Bohdan Kazydub
> Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework
--
This message was sent by Atlassian Jira
(v8.3.4#803005)