Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/637#discussion_r86042975
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
 ---
    @@ -1000,6 +1053,81 @@ public long getColumnValueCount(SchemaPath column) {
     
       @Override
       public List<SchemaPath> getPartitionColumns() {
    -    return new ArrayList<>(columnTypeMap.keySet());
    +    return new ArrayList<>(partitionColTypeMap.keySet());
       }
    +
    +  public GroupScan applyFilter(LogicalExpression filterExpr, UdfUtilities 
udfUtilities,
    +      FunctionImplementationRegistry functionImplementationRegistry, 
OptionManager optionManager) {
    +    if (fileSet.size() == 1 || ! (parquetTableMetadata instanceof 
Metadata.ParquetTableMetadata_v3)) {
    +      return null; // no pruning for 1 single parquet file or metadata is 
prior v3.
    +    }
    +
    +    final Set<SchemaPath> schemaPathsInExpr = filterExpr.accept(new 
ParquetRGFilterEvaluator.FieldReferenceFinder(), null);
    +
    +    final List<RowGroupMetadata> qualifiedRGs = new 
ArrayList<>(parquetTableMetadata.getFiles().size());
    +    Set<String> qualifiedFileNames = Sets.newHashSet(); // HashSet keeps a 
fileName unique.
    +
    +    ParquetFilterPredicate filterPredicate = null;
    +
    +    for (ParquetFileMetadata file : parquetTableMetadata.getFiles()) {
    +      final ImplicitColumnExplorer columnExplorer = new 
ImplicitColumnExplorer(optionManager, this.columns);
    +      Map<String, String> implicitColValues = 
columnExplorer.populateImplicitColumns(file.getPath(), selectionRoot);
    +
    +      for (RowGroupMetadata rowGroup : file.getRowGroups()) {
    +        ParquetMetaStatCollector statCollector = new 
ParquetMetaStatCollector(
    +            parquetTableMetadata,
    +            rowGroup.getColumns(),
    +            implicitColValues);
    +
    +        Map<SchemaPath, ColumnStatistics> columnStatisticsMap = 
statCollector.collectColStat(schemaPathsInExpr);
    --- End diff --
    
    Shouldn't we be able to build the filter predicate once outside the for 
loop? Or is it needed because the implicit columns are needed here? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to