[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

ASF GitHub Bot (Jira) Mon, 11 Nov 2019 23:29:34 -0800


    [ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972137#comment-16972137
 ]


ASF GitHub Bot commented on DRILL-7359:
---------------------------------------

paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-552769421
 
 
   The schema builder issue has been nagging at me. I observed above that the 
current way of handling nested types is getting too complex, with all the 
special cases and "resumeFoo" methods.
   
   The current design was originally designed to build a `BatchSchema` with 
`MaterializedField`s,. Those classes require a top-down construction. (That is 
also why the final `build()` method returns a `BatchSchema` and why we must 
call `buildSchema()` to get a `TupleMetadata`.)
   
   It turns out that the `SchemaMetadata` classes *do not* have the same 
constraints. They can actually be built bottom-up. It never occurred to me to 
clean up the old-style of building when we repurposed `SchemaBuilder` to build 
metadata objects.
   
   So, since we can change the structure, we could borrow a technique from, I 
think, certain JSON builders. Rather than having the schema builder create the 
nested type builder, ask the user to do so explicitly:
   
   ```
   TupleMetadata schema = new SchemaBuilder()
     .addMap(new MapBuilder("myMap")
       .add("name", MinorType.VARCHAR)
       .addMap(new MapBuilder("nested")
         .add("id", MinorType.INT)
         .build())
       .build())
     .add("last", MinorType.INT)
     .build();
   ```
   
   Adopting this model has additional benefits. It is currently a bit awkward 
to create a bare `ColumnMetadata`. We could create helper methods to create 
each kind of column schema. Something like:
   
   ```
   foo = SchemaBuilder.column("foo", MinorType.INT, DataMode.REQUIRED);
   bar = SchemaBuilder.dict("bar", MinorType.VARCHAR)
     .value(SchemaBuilder.map(...))
     .build();
   ```
   
   Lots of things we can play with.
   
   Given all of this, my advice for the `DICT` type is to figure out something 
that works with the structure we already have. Then, as a separate PR, we can 
clean up the `SchemaBuilder` to make it easier to use for the many complex 
types we now support. That way, you don't have to solve the schema builder 
problem in this PR.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Add support for DICT type in RowSet Framework
> ---------------------------------------------
>
>                 Key: DRILL-7359
>                 URL: https://issues.apache.org/jira/browse/DRILL-7359
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Bohdan Kazydub
>            Assignee: Bohdan Kazydub
>            Priority: Major
>             Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DRILL-7359) Add support for DICT type in RowSet Framework

Reply via email to