morrySnow commented on code in PR #64032:
URL: https://github.com/apache/doris/pull/64032#discussion_r3425321058
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculator.java:
##########
@@ -1208,7 +1193,9 @@ public Statistics
computeAssertNumRows(AssertNumRowsElement assertNumRowsElement
* computeFilter
*/
public Statistics computeFilter(Filter filter, Statistics inputStats) {
- return new FilterEstimation().estimate(filter.getPredicate(),
inputStats);
+ Set<Expression> conjuncts = new LinkedHashSet<>(filter.getConjuncts());
Review Comment:
The `computeFilter` output Statistics still carries
`conjunctsAppliedToRowCount` from the input (preserved through
`FilterEstimation.estimate()` → `withSel()`). After the filter has consumed
these conjuncts by removing them from the estimation predicate, the output
should ideally clear them to prevent a downstream consumer from incorrectly
removing them again. In practice this is unlikely to cause issues since
partition predicates only appear immediately above scans, but it is a
correctness hazard if the stats ever propagate to a second filter node or if
joined/federated query paths reuse the stats.
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/algebra/ExternalPartitionSelection.java:
##########
@@ -0,0 +1,112 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.nereids.trees.plans.algebra;
+
+import org.apache.doris.catalog.PartitionItem;
+import org.apache.doris.nereids.trees.expressions.Expression;
+import org.apache.doris.nereids.trees.expressions.Slot;
+
+import com.google.common.collect.ImmutableMap;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Partition selection state for external file scans.
+ */
+public class ExternalPartitionSelection extends PartitionSelection<String> {
+ // NOT_PRUNED means the Nereids planner does not handle the partition
pruning.
+ // This can be treated as the initial value of ExternalPartitionSelection.
+ // Or used to indicate that the partition pruning is not processed.
+ public static final ExternalPartitionSelection NOT_PRUNED =
+ new ExternalPartitionSelection(0, ImmutableMap.of(), false, false);
+
+ /**
+ * total partition number before pruning.
+ */
+ public final long totalPartitionNum;
+
+ /**
+ * partition name -> partition item
+ */
+ public final Map<String, PartitionItem> selectedPartitionItems;
+
+ /**
+ * Constructor for ExternalPartitionSelection.
+ */
+ public ExternalPartitionSelection(long totalPartitionNum, Map<String,
PartitionItem> selectedPartitionItems,
+ boolean partitionPruned, boolean hasPartitionConstraint) {
+ super(partitionPruned, hasPartitionConstraint);
+ this.totalPartitionNum = totalPartitionNum;
+ this.selectedPartitionItems =
ImmutableMap.copyOf(Objects.requireNonNull(selectedPartitionItems,
+ "selectedPartitionItems is null"));
+ }
+
+ /**
+ * Constructor for ExternalPartitionSelection.
+ */
+ public ExternalPartitionSelection(long totalPartitionNum, Map<String,
PartitionItem> selectedPartitionItems,
+ boolean partitionPruned, boolean hasPartitionConstraint,
List<Slot> partitionSlots,
+ Set<Expression> conjuncts) {
+ super(partitionPruned, hasPartitionConstraint,
selectedPartitionItems.keySet(), partitionSlots, conjuncts);
+ this.totalPartitionNum = totalPartitionNum;
+ this.selectedPartitionItems =
ImmutableMap.copyOf(Objects.requireNonNull(selectedPartitionItems,
+ "selectedPartitionItems is null"));
+ }
+
+ public boolean isPruned() {
+ return partitionPruned && selectedPartitionItems.size() <
totalPartitionNum;
+ }
+
+ /**
+ * Returns partition conjuncts that have already been applied to the
selected partition row count.
+ */
+ public Set<Expression> getAppliedPartitionConjuncts(List<Slot> output) {
+ return
rewriteAppliedPartitionConjuncts(selectedPartitionItems.keySet(),
buildNameToSlotMap(output));
+ }
+
+ private static Map<String, Slot> buildNameToSlotMap(List<Slot> output) {
+ Map<String, Slot> map = new HashMap<>(output.size());
+ for (Slot slot : output) {
+ map.put(slot.getName().toLowerCase(), slot);
+ }
+ return map;
+ }
+
+ @Override
+ public boolean equals(Object o) {
+ if (this == o) {
+ return true;
+ }
+ if (o == null || getClass() != o.getClass()) {
+ return false;
+ }
+ ExternalPartitionSelection that = (ExternalPartitionSelection) o;
+ return totalPartitionNum == that.totalPartitionNum
+ && Objects.equals(selectedPartitionItems.keySet(),
that.selectedPartitionItems.keySet())
Review Comment:
`equals()` compares only `selectedPartitionItems.keySet()`, not the full Map
values. The old `SelectedPartitions.equals()` (in `LogicalFileScan`) compared
the full Map including `PartitionItem` values. If two instances have the same
partition names but different `PartitionItem` objects (e.g. from different
snapshots), they would compare equal under the new logic—a behavior change.
Please document why comparing only keys is safe (presumably same partition name
always implies same `PartitionItem` within a query planning context).
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HMSExternalTable.java:
##########
@@ -829,6 +832,17 @@ private long fetchRowCountInternal(boolean fillMetaCache) {
return rowCount;
}
+ @Override
+ public long getRowCountForSelectedPartitions(ExternalPartitionSelection
partitionSelection) {
+ makeSureInitialized();
+ if (!dlaType.equals(DLAType.HIVE) || !partitionSelection.isPruned()) {
+ return UNKNOWN_ROW_COUNT;
+ }
Review Comment:
`LOG.info("Will estimate row count for table {} from selected partition file
list.", name)` fires on every query planning for partitioned Hive tables. This
could be noisy in production logs. Consider `LOG.debug` or guarding with `if
(LOG.isDebugEnabled())`.
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/StatsCalculator.java:
##########
@@ -1295,16 +1282,46 @@ private ColumnStatistic getColumnStatistic(
}
/**
- * compute stats for catalogRelations except OlapScan
+ * compute stats for file scans with selected partition row count.
*/
+ public Statistics computeFileScan(LogicalFileScan fileScan) {
+ return buildFileScanStats(fileScan, fileScan.getTable(),
fileScan.getPartitionSelection());
+ }
Review Comment:
When `selectedPartitionsRowCount == UNKNOWN_ROW_COUNT` and `isPruned()` is
false but `appliedPartitionConjuncts` is non-empty (i.e., partition predicate
matched _all_ partitions, so `size == total`), the conjuncts are still passed
to `buildCatalogRelationStats` with the whole-table row count. Although the net
selectivity is 1.0 and the estimate is numerically correct, it is semantically
misleading—the conjuncts were not actually applied to the row count. Consider
gating conjunct propagation on `isPruned()` being true or on
`selectedPartitionsRowCount != UNKNOWN_ROW_COUNT` to make the intent clearer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]