Copilot commented on code in PR #60512:
URL: https://github.com/apache/doris/pull/60512#discussion_r2765171688


##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/AzureObjStorage.java:
##########
@@ -436,6 +451,85 @@ public Status globList(String remotePath, List<RemoteFile> 
result, boolean fileN
         return st;
     }
 
+    /**
+     * Get file metadata using getProperties requests for deterministic paths.
+     * This avoids requiring list permission when only read permission is 
granted.
+     *
+     * @param bucket       Azure container name
+     * @param keyPattern   The key pattern (may contain {..} brace patterns 
but no wildcards)
+     * @param result       List to store matching RemoteFile objects
+     * @param fileNameOnly If true, only store file names; otherwise store 
full paths
+     * @param startTime    Start time for logging duration
+     * @return Status if successful, null if should fall back to listing
+     */
+    private Status globListByGetProperties(String bucket, String keyPattern,
+            List<RemoteFile> result, boolean fileNameOnly, long startTime) {
+        try {
+            // First expand any {..} patterns, then use getProperties requests
+            String expandedPattern = S3Util.extendGlobs(keyPattern);
+            List<String> expandedPaths = 
S3Util.expandBracePatterns(expandedPattern);
+
+            // Fall back to listing if too many paths to avoid overwhelming 
Azure with requests
+            // Controlled by config: s3_head_request_max_paths
+            if (expandedPaths.size() > Config.s3_head_request_max_paths) {
+                LOG.info("Expanded path count {} exceeds limit {}, falling 
back to LIST",
+                        expandedPaths.size(), 
Config.s3_head_request_max_paths);
+                return null;
+            }
+
+            if (LOG.isDebugEnabled()) {
+                LOG.debug("Using getProperties requests for deterministic path 
pattern, expanded to {} paths",
+                        expandedPaths.size());
+            }
+
+            BlobContainerClient containerClient = 
getClient().getBlobContainerClient(bucket);
+            long matchCnt = 0;
+            for (String key : expandedPaths) {
+                String fullPath = constructS3Path(key, bucket);
+                try {
+                    BlobClient blobClient = containerClient.getBlobClient(key);
+                    BlobProperties props = blobClient.getProperties();
+
+                    matchCnt++;
+                    RemoteFile remoteFile = new RemoteFile(
+                            fileNameOnly ? 
Paths.get(key).getFileName().toString() : fullPath,
+                            true, // isFile
+                            props.getBlobSize(),
+                            props.getBlobSize(),
+                            props.getLastModified() != null
+                                    ? props.getLastModified().toEpochSecond() 
: 0

Review Comment:
   Inconsistent timestamp conversion method. The existing code in this file 
uses `getSecond()` (lines 426, 561) for Azure blob timestamps, but this new 
code uses `toEpochSecond()`. This inconsistency could lead to different 
timestamp values being returned for the same type of data depending on which 
code path is used. Use `getSecond()` instead to match the existing convention.
   ```suggestion
                                       ? props.getLastModified().getSecond() : 0
   ```



##########
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java:
##########
@@ -4652,6 +4662,35 @@ public boolean isEnablePushDownStringMinMax() {
         return enablePushDownStringMinMax;
     }
 
+    public String getEnableMorValuePredicatePushdownTables() {
+        return enableMorValuePredicatePushdownTables;
+    }
+
+    /**
+     * Check if a table is enabled for MOR value predicate pushdown.
+     * @param dbName database name
+     * @param tableName table name
+     * @return true if the table is in the enabled list or if '*' is set
+     */
+    public boolean isMorValuePredicatePushdownEnabled(String dbName, String 
tableName) {
+        if (enableMorValuePredicatePushdownTables == null
+                || enableMorValuePredicatePushdownTables.isEmpty()) {
+            return false;
+        }
+        String trimmed = enableMorValuePredicatePushdownTables.trim();
+        if ("*".equals(trimmed)) {
+            return true;
+        }
+        String fullName = dbName + "." + tableName;
+        for (String table : trimmed.split(",")) {
+            if (table.trim().equalsIgnoreCase(fullName)
+                    || table.trim().equalsIgnoreCase(tableName)) {

Review Comment:
   Performance concern with repeated string parsing. The 
isMorValuePredicatePushdownEnabled method splits and trims the comma-separated 
table list on every invocation. For queries that scan multiple MOR tables, this 
parsing happens repeatedly. Consider caching the parsed table set when the 
session variable is set, or at minimum move the split operation outside the 
loop to avoid repeated splitting.
   ```suggestion
               String trimmedTable = table.trim();
               if (trimmedTable.equalsIgnoreCase(fullName)
                       || trimmedTable.equalsIgnoreCase(tableName)) {
   ```



##########
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java:
##########
@@ -4652,6 +4662,35 @@ public boolean isEnablePushDownStringMinMax() {
         return enablePushDownStringMinMax;
     }
 
+    public String getEnableMorValuePredicatePushdownTables() {
+        return enableMorValuePredicatePushdownTables;
+    }
+
+    /**
+     * Check if a table is enabled for MOR value predicate pushdown.
+     * @param dbName database name
+     * @param tableName table name
+     * @return true if the table is in the enabled list or if '*' is set
+     */
+    public boolean isMorValuePredicatePushdownEnabled(String dbName, String 
tableName) {
+        if (enableMorValuePredicatePushdownTables == null
+                || enableMorValuePredicatePushdownTables.isEmpty()) {
+            return false;
+        }
+        String trimmed = enableMorValuePredicatePushdownTables.trim();
+        if ("*".equals(trimmed)) {
+            return true;
+        }

Review Comment:
   Missing null check for dbName parameter. If olapTable.getQualifiedDbName() 
returns null, the string concatenation at line 4684 will produce 
"null.tableName" instead of failing gracefully or handling this case 
appropriately. Add a null check for dbName before using it in the string 
concatenation.
   ```suggestion
           }
           if (dbName == null || tableName == null) {
               return false;
           }
   ```



##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/S3ObjStorage.java:
##########
@@ -578,6 +579,23 @@ private GlobListResult globListInternal(String remotePath, 
List<RemoteFile> resu
             }
 
             bucket = uri.getBucket();
+
+            // Optimization: For deterministic paths (no wildcards like *, ?, 
[...]),
+            // use HEAD requests instead of listing to avoid requiring 
ListBucket permission.
+            // This is useful when only GetObject permission is granted.
+            // Controlled by config: s3_skip_list_for_deterministic_path
+            String keyPattern = uri.getKey();
+            if (Config.s3_skip_list_for_deterministic_path
+                    && S3Util.isDeterministicPattern(keyPattern)
+                    && !hasLimits && startFile == null) {
+                GlobListResult headResult = globListByHeadRequests(
+                        bucket, keyPattern, result, fileNameOnly, startTime);
+                if (headResult != null) {
+                    return headResult;
+                }
+                // If headResult is null, fall through to use listing
+            }

Review Comment:
   The PR description does not mention the S3/Azure optimization features that 
add HEAD request support for deterministic paths. This PR appears to include 
two unrelated features: (1) MOR value predicate pushdown control, and (2) 
S3/Azure deterministic path optimization. These should ideally be in separate 
PRs for better code review and maintainability. At minimum, the PR description 
should be updated to reflect all changes being made.



##########
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java:
##########
@@ -4652,6 +4662,35 @@ public boolean isEnablePushDownStringMinMax() {
         return enablePushDownStringMinMax;
     }
 
+    public String getEnableMorValuePredicatePushdownTables() {
+        return enableMorValuePredicatePushdownTables;
+    }
+
+    /**
+     * Check if a table is enabled for MOR value predicate pushdown.
+     * @param dbName database name
+     * @param tableName table name
+     * @return true if the table is in the enabled list or if '*' is set
+     */
+    public boolean isMorValuePredicatePushdownEnabled(String dbName, String 
tableName) {
+        if (enableMorValuePredicatePushdownTables == null
+                || enableMorValuePredicatePushdownTables.isEmpty()) {
+            return false;
+        }
+        String trimmed = enableMorValuePredicatePushdownTables.trim();
+        if ("*".equals(trimmed)) {
+            return true;
+        }
+        String fullName = dbName + "." + tableName;
+        for (String table : trimmed.split(",")) {
+            if (table.trim().equalsIgnoreCase(fullName)
+                    || table.trim().equalsIgnoreCase(tableName)) {
+                return true;
+            }
+        }
+        return false;
+    }

Review Comment:
   Missing test coverage for the MOR value predicate pushdown feature. The new 
session variable isMorValuePredicatePushdownEnabled() method includes parsing 
logic for comma-separated table lists, wildcard handling, and case-insensitive 
matching. This logic should have unit tests to verify correct behavior for 
various input formats (empty string, single table, multiple tables, wildcard, 
table name only vs qualified name, etc.).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to