Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/905#discussion_r140417433
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdMaxRowCount.java
 ---
    @@ -0,0 +1,71 @@
    +/*
    +* Licensed to the Apache Software Foundation (ASF) under one or more
    +* contributor license agreements.  See the NOTICE file distributed with
    +* this work for additional information regarding copyright ownership.
    +* The ASF licenses this file to you under the Apache License, Version 2.0
    +* (the "License"); you may not use this file except in compliance with
    +* the License.  You may obtain a copy of the License at
    +*
    +* http://www.apache.org/licenses/LICENSE-2.0
    +*
    +* Unless required by applicable law or agreed to in writing, software
    +* distributed under the License is distributed on an "AS IS" BASIS,
    +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +* See the License for the specific language governing permissions and
    +* limitations under the License.
    +*/
    +package org.apache.drill.exec.planner.cost;
    +
    +import org.apache.calcite.plan.volcano.RelSubset;
    +import org.apache.calcite.rel.SingleRel;
    +import org.apache.calcite.rel.core.TableScan;
    +import org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider;
    +import org.apache.calcite.rel.metadata.RelMdMaxRowCount;
    +import org.apache.calcite.rel.metadata.RelMetadataProvider;
    +import org.apache.calcite.rel.metadata.RelMetadataQuery;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.drill.exec.planner.physical.AbstractPrel;
    +import org.apache.drill.exec.planner.physical.ScanPrel;
    +
    +/**
    + * DrillRelMdMaxRowCount supplies a specific implementation of
    + * {@link RelMetadataQuery#getMaxRowCount} for Drill.
    + */
    +public class DrillRelMdMaxRowCount extends RelMdMaxRowCount {
    +
    +  private static final DrillRelMdMaxRowCount INSTANCE = new 
DrillRelMdMaxRowCount();
    +
    +  public static final RelMetadataProvider SOURCE = 
ReflectiveRelMetadataProvider.reflectiveSource(BuiltInMethod.MAX_ROW_COUNT.method,
 INSTANCE);
    +
    +  public Double getMaxRowCount(ScanPrel rel, RelMetadataQuery mq) {
    +    // the actual row count is known so returns its value
    +    return rel.estimateRowCount(mq);
    --- End diff --
    
    Returning 'estimated' row count means that this is just an estimate, not 
the actual value which could be higher. Looking at the implementation of 
estimatedRowCount() for several of the storage/format plugins, there are 
several that use NO_EXACT_ROW_COUNT.  for instance see [1] for the text format 
plugin.  So, I feel overloading getMaxRowCount() to return an estimate may 
cause problems.  If you look at the semantics of getMaxRowCount in Calcite's 
RelMdMaxRowCount, it is only intended for cases where **_during planning 
time_** we can guarantee that the max row count will never exceed that value.  
For example,  an Aggregate with no group-by clause or a LIMIT etc.  
    
    
    [1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java#L186


---

Reply via email to