[ https://issues.apache.org/jira/browse/DRILL-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196619#comment-16196619 ]
ASF GitHub Bot commented on DRILL-5797: --------------------------------------- Github user dprofeta commented on a diff in the pull request: https://github.com/apache/drill/pull/976#discussion_r143403232 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java --- @@ -156,18 +160,39 @@ public ScanBatch getBatch(FragmentContext context, ParquetRowGroupScan rowGroupS return new ScanBatch(rowGroupScan, context, oContext, readers, implicitColumns); } - private static boolean isComplex(ParquetMetadata footer) { - MessageType schema = footer.getFileMetaData().getSchema(); + private static boolean isComplex(ParquetMetadata footer, List<SchemaPath> columns) { + if (Utilities.isStarQuery(columns)) { + MessageType schema = footer.getFileMetaData().getSchema(); - for (Type type : schema.getFields()) { - if (!type.isPrimitive()) { - return true; + for (Type type : schema.getFields()) { + if (!type.isPrimitive()) { + return true; + } } - } - for (ColumnDescriptor col : schema.getColumns()) { - if (col.getMaxRepetitionLevel() > 0) { - return true; + for (ColumnDescriptor col : schema.getColumns()) { + if (col.getMaxRepetitionLevel() > 0) { + return true; + } + } + return false; + } else { + for (SchemaPath column : columns) { + if (isColumnComplex(footer.getFileMetaData().getSchema(), column)) { + return true; + } } + return false; + } + } + + private static boolean isColumnComplex(GroupType grouptype, SchemaPath column) { + PathSegment.NameSegment root = column.getRootSegment(); + if (!grouptype.containsField(root.getPath().toLowerCase())) { + return false; + } + Type type = grouptype.getType(root.getPath().toLowerCase()); --- End diff -- ok, for `getType()`. It throws an exception so I will catch it. I don't see any `getName()` in the `SchemaPath` / `PathSegment` class. Can you tell me which `getName()` you mean? > Use more often the new parquet reader > ------------------------------------- > > Key: DRILL-5797 > URL: https://issues.apache.org/jira/browse/DRILL-5797 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet > Reporter: Damien Profeta > Assignee: Damien Profeta > Fix For: 1.12.0 > > > The choice of using the regular parquet reader of the optimized one is based > of what type of columns is in the file. But the columns that are read by the > query doesn't matter. We can increase a little bit the cases where the > optimized reader is used by checking is the projected column are simple or > not. > This is an optimization waiting for the fast parquet reader to handle complex > structure. -- This message was sent by Atlassian JIRA (v6.4.14#64029)