Github user sachouche commented on the issue: https://github.com/apache/drill/pull/976 Looking at the stack trace: - The code definitely is initializing a column of type REPEATABLE - The Fast Reader didn't expect this scenario so it used a default container (NullableVarBinary) for VL binary DT Why this is happening? - The code in ReadState::buildReader() is processing all selected columns - This information is obtained from the ParquetSchema - Looking at the code, this seems a case-sensitivity issue - The ParquetSchema is case-insensitive whereas the Parquet GroupType is not - Damien added a catch handler (column not found) to handle use-cases where we are projecting non-existing columns - This basically is leading to an unforeseen use-case - Assume column XYZ is complex - User uses an alias (xyz) - The new code will allow this column to pass and treat is as simple - The ParquetSchema is being case insensitive will process this column - and thus the exception in the test suite Suggested Fix - Create a map (key to-lower-case) and register all current row-group columns - Use this map to locate a selected column type
---