[ 
https://issues.apache.org/jira/browse/DRILL-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16454503#comment-16454503
 ] 

Oleksandr Kalinin commented on DRILL-5797:
------------------------------------------

When debugging complex12.q failure from the list of failing queries above it 
appears that there is another not related to case sensitivity. 

If file schema has primitive column A and repeated column B with nested column 
A (B.A), then executing query 'select A from ....' leads to following scenario:

(1) rowGroupScan passed to ParquetScanBatchCreator contains only column A. That 
will be correctly handled by the code in PR allowing the fast reader
(2) However, ParquetSchema passed to ReadStat will contain both A and B.A which 
leads to failure explained above in this JIRA as B.A is complex

Looks like additional issue, not related to PR code though. I also could 
reproduce case sensitivity issue, investigating both issues currently.

> Use more often the new parquet reader
> -------------------------------------
>
>                 Key: DRILL-5797
>                 URL: https://issues.apache.org/jira/browse/DRILL-5797
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>            Reporter: Damien Profeta
>            Assignee: Damien Profeta
>            Priority: Major
>             Fix For: 1.14.0
>
>
> The choice of using the regular parquet reader of the optimized one is based 
> of what type of columns is in the file. But the columns that are read by the 
> query doesn't matter. We can increase a little bit the cases where the 
> optimized reader is used by checking is the projected column are simple or 
> not.
> This is an optimization waiting for the fast parquet reader to handle complex 
> structure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to