[jira] [Commented] (DRILL-5797) Use more often the new parquet reader

ASF GitHub Bot (JIRA) Mon, 09 Oct 2017 01:09:18 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196619#comment-16196619
 ]


ASF GitHub Bot commented on DRILL-5797:
---------------------------------------

Github user dprofeta commented on a diff in the pull request:

    https://github.com/apache/drill/pull/976#discussion_r143403232
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java
 ---
    @@ -156,18 +160,39 @@ public ScanBatch getBatch(FragmentContext context, 
ParquetRowGroupScan rowGroupS
         return new ScanBatch(rowGroupScan, context, oContext, readers, 
implicitColumns);
       }
     
    -  private static boolean isComplex(ParquetMetadata footer) {
    -    MessageType schema = footer.getFileMetaData().getSchema();
    +  private static boolean isComplex(ParquetMetadata footer, 
List<SchemaPath> columns) {
    +    if (Utilities.isStarQuery(columns)) {
    +      MessageType schema = footer.getFileMetaData().getSchema();
     
    -    for (Type type : schema.getFields()) {
    -      if (!type.isPrimitive()) {
    -        return true;
    +      for (Type type : schema.getFields()) {
    +        if (!type.isPrimitive()) {
    +          return true;
    +        }
           }
    -    }
    -    for (ColumnDescriptor col : schema.getColumns()) {
    -      if (col.getMaxRepetitionLevel() > 0) {
    -        return true;
    +      for (ColumnDescriptor col : schema.getColumns()) {
    +        if (col.getMaxRepetitionLevel() > 0) {
    +          return true;
    +        }
    +      }
    +      return false;
    +    } else {
    +      for (SchemaPath column : columns) {
    +        if (isColumnComplex(footer.getFileMetaData().getSchema(), column)) 
{
    +          return true;
    +        }
           }
    +      return false;
    +    }
    +  }
    +
    +  private static boolean isColumnComplex(GroupType grouptype, SchemaPath 
column) {
    +    PathSegment.NameSegment root = column.getRootSegment();
    +    if (!grouptype.containsField(root.getPath().toLowerCase())) {
    +      return false;
    +    }
    +    Type type = grouptype.getType(root.getPath().toLowerCase());
    --- End diff --
    
    ok, for `getType()`. It throws an exception so I will catch it.
    I don't see any `getName()` in the `SchemaPath` / `PathSegment` class. Can 
you tell me which `getName()` you mean?


> Use more often the new parquet reader
> -------------------------------------
>
>                 Key: DRILL-5797
>                 URL: https://issues.apache.org/jira/browse/DRILL-5797
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Parquet
>            Reporter: Damien Profeta
>            Assignee: Damien Profeta
>             Fix For: 1.12.0
>
>
> The choice of using the regular parquet reader of the optimized one is based 
> of what type of columns is in the file. But the columns that are read by the 
> query doesn't matter. We can increase a little bit the cases where the 
> optimized reader is used by checking is the projected column are simple or 
> not.
> This is an optimization waiting for the fast parquet reader to handle complex 
> structure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5797) Use more often the new parquet reader

Reply via email to