paul-rogers commented on a change in pull request #1954: DRILL-7509: Incorrect TupleSchema is created for DICT column when querying Parquet files URL: https://github.com/apache/drill/pull/1954#discussion_r366601149
########## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/util/SchemaPathUtils.java ########## @@ -50,7 +51,7 @@ public static ColumnMetadata getColumnMetadata(SchemaPath schemaPath, TupleMetad while (!colPath.isLastPath() && colMetadata != null) { if (colMetadata.isDict()) { // get dict's value field metadata - colMetadata = colMetadata.tupleSchema().metadata(0).tupleSchema().metadata(1); + colMetadata = colMetadata.tupleSchema().metadata(1); Review comment: This also has a bad "code smell". We are asking each tool that uses metadata to know how to reference members within a map or DICT. Before DICT, the only type with internal structure was a MAP, which was represented as a tuple, so the original code made sense. But, with DICT, we now no longer tie the idea of "has named members" with the idea of "has a nested tuple schema". I wonder, should we add to the `ColumnMetadata` class a `member(String/int)` method? For, `MAP`, it would turn around and call into the tuple schema. For a `DICT`, it would have a static mapping of `key`/`value` and `0`/`1`. Thinking more generally, a `UNION` could could implement `member(String)` as an access to the subtype given a type. Not necessary in this PR, but perhaps we can file a JIRA for several DICT-aware cleanups. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services