[ 
https://issues.apache.org/jira/browse/DRILL-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015921#comment-17015921
 ] 

ASF GitHub Bot commented on DRILL-7509:
---------------------------------------

KazydubB commented on pull request #1954: DRILL-7509: Incorrect TupleSchema is 
created for DICT column when querying Parquet files
URL: https://github.com/apache/drill/pull/1954#discussion_r366851158
 
 

 ##########
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/util/SchemaPathUtils.java
 ##########
 @@ -50,7 +51,7 @@ public static ColumnMetadata getColumnMetadata(SchemaPath 
schemaPath, TupleMetad
     while (!colPath.isLastPath() && colMetadata != null) {
       if (colMetadata.isDict()) {
         // get dict's value field metadata
-        colMetadata = 
colMetadata.tupleSchema().metadata(0).tupleSchema().metadata(1);
+        colMetadata = colMetadata.tupleSchema().metadata(1);
 
 Review comment:
   That's a good point, a `keyMetadata()` and `valueMetadata()` can be defined 
in `DictColumnMetadata` (as an another to what you've suggested; but that'd 
require casting).
   
   I don't see a strong need to have a static mapping of `key` and `value` for 
`DICT`, as the existing mechanism does the work and suits well for the case (at 
least, to my understanding). But I do agree that hiding this implementation 
details is better practice. Will introduce a new method.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incorrect TupleSchema is created for DICT column when querying Parquet files
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-7509
>                 URL: https://issues.apache.org/jira/browse/DRILL-7509
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Bohdan Kazydub
>            Assignee: Bohdan Kazydub
>            Priority: Major
>             Fix For: 1.18.0
>
>
> When {{DICT}} column is queried from Parquet file, its {{TupleSchema}} 
> contains nested element, e.g. `map`, itself contains `key` and `value` 
> fields, rather than containing the `key` and `value` fields in the {{DICT}}'s 
> {{TupleSchema}} itself. The nested element, `map`, comes from the inner 
> structure of Parquet's {{MAP}} (which corresponds to Drill's {{DICT}}) 
> representation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to