[
https://issues.apache.org/jira/browse/DRILL-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234473#comment-16234473
]
ASF GitHub Bot commented on DRILL-5797:
---------------------------------------
Github user sachouche commented on the issue:
https://github.com/apache/drill/pull/976
Looking at the stack trace:
- The code definitely is initializing a column of type REPEATABLE
- The Fast Reader didn't expect this scenario so it used a default
container (NullableVarBinary) for VL binary DT
Why this is happening?
- The code in ReadState::buildReader() is processing all selected columns
- This information is obtained from the ParquetSchema
- Looking at the code, this seems a case-sensitivity issue
- The ParquetSchema is case-insensitive whereas the Parquet GroupType is not
- Damien added a catch handler (column not found) to handle use-cases where
we are projecting non-existing columns
- This basically is leading to an unforeseen use-case
- Assume column XYZ is complex
- User uses an alias (xyz)
- The new code will allow this column to pass and treat is as simple
- The ParquetSchema is being case insensitive will process this column
- and thus the exception in the test suite
Suggested Fix
- Create a map (key to-lower-case) and register all current row-group
columns
- Use this map to locate a selected column type
> Use more often the new parquet reader
> -------------------------------------
>
> Key: DRILL-5797
> URL: https://issues.apache.org/jira/browse/DRILL-5797
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Parquet
> Reporter: Damien Profeta
> Assignee: Damien Profeta
> Priority: Major
> Fix For: 1.12.0
>
>
> The choice of using the regular parquet reader of the optimized one is based
> of what type of columns is in the file. But the columns that are read by the
> query doesn't matter. We can increase a little bit the cases where the
> optimized reader is used by checking is the projected column are simple or
> not.
> This is an optimization waiting for the fast parquet reader to handle complex
> structure.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)