github-actions[bot] commented on code in PR #64648:
URL: https://github.com/apache/doris/pull/64648#discussion_r3496623129


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java:
##########
@@ -1150,20 +1150,46 @@ public long getCountFromSnapshot() throws UserException 
{
             return 0;
         }
 
-        Map<String, String> summary = snapshot.summary();
-        if (!summary.get(IcebergUtils.TOTAL_EQUALITY_DELETES).equals("0")) {
+        return getCountFromSummary(snapshot.summary(), 
sessionVariable.ignoreIcebergDanglingDelete);
+    }
+
+    /**
+     * Decide whether the row count can be pushed down from an Iceberg snapshot
+     * summary, returning the count, or -1 if it cannot be pushed down (the
+     * caller then falls back to a normal scan).
+     *
+     * <p>A snapshot summary is NOT guaranteed to carry the {@code total-*}
+     * counters: snapshots produced by compaction/replace (and some writers)
+     * may omit {@code total-equality-deletes} / {@code total-position-deletes}
+     * / {@code total-records}. Previously this method called
+     * {@code summary.get(...).equals(...)} / {@code Long.parseLong(...)}
+     * directly on the map values, which threw a {@link NullPointerException}
+     * (e.g. "Cannot invoke String.equals because Map.get(...) is null") for
+     * {@code SELECT COUNT(*)} on such a table while {@code SELECT *} worked.
+     * When any required counter is absent we now fall back to a scan.
+     */
+    @VisibleForTesting
+    public static long getCountFromSummary(Map<String, String> summary, 
boolean ignoreDanglingDelete) {
+        String equalityDeletes = 
summary.get(IcebergUtils.TOTAL_EQUALITY_DELETES);
+        String positionDeletes = 
summary.get(IcebergUtils.TOTAL_POSITION_DELETES);
+        String totalRecords = summary.get(IcebergUtils.TOTAL_RECORDS);
+        if (equalityDeletes == null || positionDeletes == null || totalRecords 
== null) {
+            // summary is missing a required counter, can not push down count
+            return -1;
+        }
+        if (!equalityDeletes.equals("0")) {
             // has equality delete files, can not push down count
             return -1;
         }
 
-        long deleteCount = 
Long.parseLong(summary.get(IcebergUtils.TOTAL_POSITION_DELETES));
+        long deleteCount = Long.parseLong(positionDeletes);
         if (deleteCount == 0) {
             // no delete files, can push down count directly
-            return Long.parseLong(summary.get(IcebergUtils.TOTAL_RECORDS));
+            return Long.parseLong(totalRecords);
         }
-        if (sessionVariable.ignoreIcebergDanglingDelete) {
+        if (ignoreDanglingDelete) {
             // has position delete files, if we ignore dangling delete, can 
push down count
-            return Long.parseLong(summary.get(IcebergUtils.TOTAL_RECORDS)) - 
deleteCount;
+            return Long.parseLong(totalRecords) - deleteCount;

Review Comment:
   When this subtraction returns `0` (for example `total-records == 
total-position-deletes` with `ignore_iceberg_dangling_delete=true`), FE still 
marks the scan as table-level COUNT pushdown because `isBatchMode()` accepts 
any `countFromSnapshot >= 0` and `assignCountToSplits()` sends a split row 
count of `0`. On BE, though, `FileScanner` treats `table_level_row_count >= 0` 
as table-level count, while `IcebergReaderMixin::_init_row_filters()` only 
preserves the COUNT fast path for `table_level_row_count > 0`; with zero and 
delete descriptors it initializes the delete readers and resets the reader 
pushdown op to `NONE`, so the range never becomes the intended `CountReader(0)` 
path. Please either make the BE guard match the `>= 0` table-level count 
contract, or have this FE path fall back instead of returning zero until BE 
handles zero-count table-level pushdown consistently. A unit case for 
`summary("0", "N", "N")` with `ignoreDanglingDelete=true` would cover this 
boundary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to