[GitHub] [drill] paul-rogers commented on a change in pull request #1954: DRILL-7509: Incorrect TupleSchema is created for DICT column when querying Parquet files

GitBox Tue, 14 Jan 2020 14:31:12 -0800

paul-rogers commented on a change in pull request #1954: DRILL-7509: Incorrect 
TupleSchema is created for DICT column when querying Parquet files
URL: https://github.com/apache/drill/pull/1954#discussion_r366601149


 ##########
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/util/SchemaPathUtils.java
 ##########
 @@ -50,7 +51,7 @@ public static ColumnMetadata getColumnMetadata(SchemaPath 
schemaPath, TupleMetad
     while (!colPath.isLastPath() && colMetadata != null) {
       if (colMetadata.isDict()) {
         // get dict's value field metadata
-        colMetadata = 
colMetadata.tupleSchema().metadata(0).tupleSchema().metadata(1);
+        colMetadata = colMetadata.tupleSchema().metadata(1);
 
 Review comment:
   This also has a bad "code smell". We are asking each tool that uses metadata 
to know how to reference members within a map or DICT. Before DICT, the only 
type with internal structure was a MAP, which was represented as a tuple, so 
the original code made sense. But, with DICT, we now no longer tie the idea of 
"has named members" with the idea of "has a nested tuple schema".
   
   I wonder, should we add to the `ColumnMetadata` class a `member(String/int)` 
method? For, `MAP`, it would turn around and call into the tuple schema. For a 
`DICT`, it would have a static mapping of `key`/`value` and `0`/`1`.
   
   Thinking more generally, a `UNION` could could implement `member(String)` as 
an access to the subtype given a type.
   
   Not necessary in this PR, but perhaps we can file a JIRA for several 
DICT-aware cleanups.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on a change in pull request #1954: DRILL-7509: Incorrect TupleSchema is created for DICT column when querying Parquet files

Reply via email to