amansinha100 commented on a change in pull request #1646: DRILL-6852: Adapt current Parquet Metadata cache implementation to use Drill Metastore API URL: https://github.com/apache/drill/pull/1646#discussion_r260904476
########## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java ########## @@ -0,0 +1,734 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.base; + +import com.fasterxml.jackson.annotation.JsonIgnore; +import com.fasterxml.jackson.annotation.JsonProperty; +import org.apache.commons.collections.CollectionUtils; +import org.apache.drill.common.types.Types; +import org.apache.drill.common.expression.ErrorCollector; +import org.apache.drill.common.expression.ErrorCollectorImpl; +import org.apache.drill.common.expression.ExpressionStringBuilder; +import org.apache.drill.common.expression.LogicalExpression; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.expression.ValueExpressions; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.exec.compile.sig.ConstantExpressionIdentifier; +import org.apache.drill.exec.expr.ExpressionTreeMaterializer; +import org.apache.drill.exec.expr.fn.FunctionImplementationRegistry; +import org.apache.drill.exec.expr.stat.RowsMatch; +import org.apache.drill.exec.ops.UdfUtilities; +import org.apache.drill.exec.planner.physical.PlannerSettings; +import org.apache.drill.exec.record.MaterializedField; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.SchemaPathUtils; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.server.options.OptionManager; +import org.apache.drill.exec.store.ColumnExplorer; +import org.apache.drill.exec.store.dfs.FileSelection; +import org.apache.drill.exec.store.parquet.FilterEvaluatorUtils; +import org.apache.drill.exec.store.parquet.ParquetTableMetadataUtils; +import org.apache.drill.metastore.BaseMetadata; +import org.apache.drill.metastore.ColumnStatistics; +import org.apache.drill.metastore.ColumnStatisticsKind; +import org.apache.drill.metastore.FileMetadata; +import org.apache.drill.metastore.LocationProvider; +import org.apache.drill.metastore.PartitionMetadata; +import org.apache.drill.metastore.TableMetadata; +import org.apache.drill.metastore.TableStatisticsKind; +import org.apache.drill.metastore.expr.FilterBuilder; +import org.apache.drill.metastore.expr.FilterPredicate; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.Collections; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.stream.Collectors; + +/** + * Represents table group scan with metadata usage. + */ +public abstract class AbstractGroupScanWithMetadata extends AbstractFileGroupScan { + + protected TableMetadataProvider metadataProvider; Review comment: At first, I was thinking that a TableMetadataProvider would provide table metadata for multiple tables (the term 'provider' seems to imply that). However, here each table has its own provider which generates the corresponding TableMetadata. One thing we should measure is the impact on the heap usage during planning time based on these changes by examining a somewhat complex query (e.g a 10 table join from TPC-DS). Previously, each table would have 1 ParquetGroupScan and within that a list of RowGroupInfo objects plus other related objects. With this PR, the additional objects would be TableMetadataProvider, TableMetadata, list of FileMetadata and list of PartitionMetadata (if partition is present). The row group level things are bundled into RowGroupMetadata. We want to move towards a 'lean planner' where complex query plan generation can be done very quickly, so let's keep that requirement in mind. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
