morrySnow commented on code in PR #64032:
URL: https://github.com/apache/doris/pull/64032#discussion_r3425321058


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculator.java:
##########
@@ -1208,7 +1193,9 @@ public Statistics 
computeAssertNumRows(AssertNumRowsElement assertNumRowsElement
      * computeFilter
      */
     public Statistics computeFilter(Filter filter, Statistics inputStats) {
-        return new FilterEstimation().estimate(filter.getPredicate(), 
inputStats);
+        Set<Expression> conjuncts = new LinkedHashSet<>(filter.getConjuncts());

Review Comment:
   The `computeFilter` output Statistics still carries 
`conjunctsAppliedToRowCount` from the input (preserved through 
`FilterEstimation.estimate()` → `withSel()`). After the filter has consumed 
these conjuncts by removing them from the estimation predicate, the output 
should ideally clear them to prevent a downstream consumer from incorrectly 
removing them again. In practice this is unlikely to cause issues since 
partition predicates only appear immediately above scans, but it is a 
correctness hazard if the stats ever propagate to a second filter node or if 
joined/federated query paths reuse the stats.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/algebra/ExternalPartitionSelection.java:
##########
@@ -0,0 +1,112 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.nereids.trees.plans.algebra;
+
+import org.apache.doris.catalog.PartitionItem;
+import org.apache.doris.nereids.trees.expressions.Expression;
+import org.apache.doris.nereids.trees.expressions.Slot;
+
+import com.google.common.collect.ImmutableMap;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Partition selection state for external file scans.
+ */
+public class ExternalPartitionSelection extends PartitionSelection<String> {
+    // NOT_PRUNED means the Nereids planner does not handle the partition 
pruning.
+    // This can be treated as the initial value of ExternalPartitionSelection.
+    // Or used to indicate that the partition pruning is not processed.
+    public static final ExternalPartitionSelection NOT_PRUNED =
+            new ExternalPartitionSelection(0, ImmutableMap.of(), false, false);
+
+    /**
+     * total partition number before pruning.
+     */
+    public final long totalPartitionNum;
+
+    /**
+     * partition name -> partition item
+     */
+    public final Map<String, PartitionItem> selectedPartitionItems;
+
+    /**
+     * Constructor for ExternalPartitionSelection.
+     */
+    public ExternalPartitionSelection(long totalPartitionNum, Map<String, 
PartitionItem> selectedPartitionItems,
+            boolean partitionPruned, boolean hasPartitionConstraint) {
+        super(partitionPruned, hasPartitionConstraint);
+        this.totalPartitionNum = totalPartitionNum;
+        this.selectedPartitionItems = 
ImmutableMap.copyOf(Objects.requireNonNull(selectedPartitionItems,
+                "selectedPartitionItems is null"));
+    }
+
+    /**
+     * Constructor for ExternalPartitionSelection.
+     */
+    public ExternalPartitionSelection(long totalPartitionNum, Map<String, 
PartitionItem> selectedPartitionItems,
+            boolean partitionPruned, boolean hasPartitionConstraint, 
List<Slot> partitionSlots,
+            Set<Expression> conjuncts) {
+        super(partitionPruned, hasPartitionConstraint, 
selectedPartitionItems.keySet(), partitionSlots, conjuncts);
+        this.totalPartitionNum = totalPartitionNum;
+        this.selectedPartitionItems = 
ImmutableMap.copyOf(Objects.requireNonNull(selectedPartitionItems,
+                "selectedPartitionItems is null"));
+    }
+
+    public boolean isPruned() {
+        return partitionPruned && selectedPartitionItems.size() < 
totalPartitionNum;
+    }
+
+    /**
+     * Returns partition conjuncts that have already been applied to the 
selected partition row count.
+     */
+    public Set<Expression> getAppliedPartitionConjuncts(List<Slot> output) {
+        return 
rewriteAppliedPartitionConjuncts(selectedPartitionItems.keySet(), 
buildNameToSlotMap(output));
+    }
+
+    private static Map<String, Slot> buildNameToSlotMap(List<Slot> output) {
+        Map<String, Slot> map = new HashMap<>(output.size());
+        for (Slot slot : output) {
+            map.put(slot.getName().toLowerCase(), slot);
+        }
+        return map;
+    }
+
+    @Override
+    public boolean equals(Object o) {
+        if (this == o) {
+            return true;
+        }
+        if (o == null || getClass() != o.getClass()) {
+            return false;
+        }
+        ExternalPartitionSelection that = (ExternalPartitionSelection) o;
+        return totalPartitionNum == that.totalPartitionNum
+                && Objects.equals(selectedPartitionItems.keySet(), 
that.selectedPartitionItems.keySet())

Review Comment:
   `equals()` compares only `selectedPartitionItems.keySet()`, not the full Map 
values. The old `SelectedPartitions.equals()` (in `LogicalFileScan`) compared 
the full Map including `PartitionItem` values. If two instances have the same 
partition names but different `PartitionItem` objects (e.g. from different 
snapshots), they would compare equal under the new logic—a behavior change. 
Please document why comparing only keys is safe (presumably same partition name 
always implies same `PartitionItem` within a query planning context).



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HMSExternalTable.java:
##########
@@ -829,6 +832,17 @@ private long fetchRowCountInternal(boolean fillMetaCache) {
         return rowCount;
     }
 
+    @Override
+    public long getRowCountForSelectedPartitions(ExternalPartitionSelection 
partitionSelection) {
+        makeSureInitialized();
+        if (!dlaType.equals(DLAType.HIVE) || !partitionSelection.isPruned()) {
+            return UNKNOWN_ROW_COUNT;
+        }

Review Comment:
   `LOG.info("Will estimate row count for table {} from selected partition file 
list.", name)` fires on every query planning for partitioned Hive tables. This 
could be noisy in production logs. Consider `LOG.debug` or guarding with `if 
(LOG.isDebugEnabled())`.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculator.java:
##########
@@ -1295,16 +1282,46 @@ private ColumnStatistic getColumnStatistic(
     }
 
     /**
-     * compute stats for catalogRelations except OlapScan
+     * compute stats for file scans with selected partition row count.
      */
+    public Statistics computeFileScan(LogicalFileScan fileScan) {
+        return buildFileScanStats(fileScan, fileScan.getTable(), 
fileScan.getPartitionSelection());
+    }

Review Comment:
   When `selectedPartitionsRowCount == UNKNOWN_ROW_COUNT` and `isPruned()` is 
false but `appliedPartitionConjuncts` is non-empty (i.e., partition predicate 
matched _all_ partitions, so `size == total`), the conjuncts are still passed 
to `buildCatalogRelationStats` with the whole-table row count. Although the net 
selectivity is 1.0 and the estimate is numerically correct, it is semantically 
misleading—the conjuncts were not actually applied to the row count. Consider 
gating conjunct propagation on `isPruned()` being true or on 
`selectedPartitionsRowCount != UNKNOWN_ROW_COUNT` to make the intent clearer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to