Copilot commented on code in PR #61843:
URL: https://github.com/apache/doris/pull/61843#discussion_r3004429869


##########
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:
##########
@@ -597,11 +606,30 @@ private static List<Character> 
expandBracketContent(String content) {
      */
     public static List<String> expandBracePatterns(String pathPattern) {
         List<String> result = new ArrayList<>();
-        expandBracePatternsRecursive(pathPattern, result);
+        expandBracePatternsRecursive(pathPattern, result, 0);
         return result;
     }
 
-    private static void expandBracePatternsRecursive(String pattern, 
List<String> result) {
+    /**
+     * Expand brace patterns with a limit on the number of expanded paths.
+     * Stops expansion early if the limit is exceeded, avoiding large 
allocations.
+     *
+     * @param pathPattern Path with optional brace patterns
+     * @param maxPaths Maximum number of expanded paths allowed (must be > 0)
+     * @return List of expanded concrete paths
+     * @throws BraceExpansionTooLargeException if expansion exceeds maxPaths
+     */
+    public static List<String> expandBracePatterns(String pathPattern, int 
maxPaths) {

Review Comment:
   `expandBracePatterns(String, int)` Javadoc says `maxPaths` must be > 0, but 
the implementation treats 0/negative as “unlimited”. Since callers pass 
`Config.s3_head_request_max_paths`, setting that config to 0 (which previously 
effectively disabled the HEAD optimization) would now remove the limit and 
could trigger unbounded expansion / excessive HEAD requests. Please clarify and 
enforce the contract (e.g., reject `maxPaths <= 0` in this overload and keep 
“unlimited” only for the 1-arg method, or update the config semantics/docs 
accordingly).
   ```suggestion
        * @throws BraceExpansionTooLargeException if expansion exceeds maxPaths
        * @throws IllegalArgumentException if {@code maxPaths} is not greater 
than zero
        */
       public static List<String> expandBracePatterns(String pathPattern, int 
maxPaths) {
           if (maxPaths <= 0) {
               throw new IllegalArgumentException("maxPaths must be > 0");
           }
   ```



##########
fe/fe-core/src/test/java/org/apache/doris/common/util/S3UtilTest.java:
##########
@@ -456,5 +456,46 @@ public void testExpandBracketPatterns_malformedBracket() {
         // Malformed bracket (no closing ]) - [ kept as literal
         Assert.assertEquals("file[abc.csv", 
S3Util.expandBracketPatterns("file[abc.csv"));
     }
+
+    // Tests for limit-aware expandBracePatterns
+
+    @Test
+    public void testExpandBracePatterns_withinLimit() {
+        // Expansion within the limit should succeed
+        List<String> result = S3Util.expandBracePatterns("file{1,2,3}.csv", 
10);
+        Assert.assertEquals(Arrays.asList("file1.csv", "file2.csv", 
"file3.csv"), result);
+    }
+
+    @Test
+    public void testExpandBracePatterns_exactlyAtLimit() {
+        // Expansion exactly at the limit should succeed
+        List<String> result = S3Util.expandBracePatterns("file{1,2,3}.csv", 3);
+        Assert.assertEquals(Arrays.asList("file1.csv", "file2.csv", 
"file3.csv"), result);
+    }
+
+    @Test(expected = S3Util.BraceExpansionTooLargeException.class)
+    public void testExpandBracePatterns_exceedsLimit() {
+        // Expansion exceeding the limit should throw
+        S3Util.expandBracePatterns("file{1,2,3,4,5}.csv", 3);
+    }
+
+    @Test(expected = S3Util.BraceExpansionTooLargeException.class)
+    public void testExpandBracePatterns_oneOverLimit() {
+        // maxPaths+1 items must also throw (boundary case)
+        S3Util.expandBracePatterns("file{1,2,3,4}.csv", 3);
+    }
+
+    @Test(expected = S3Util.BraceExpansionTooLargeException.class)
+    public void testExpandBracePatterns_cartesianExceedsLimit() {
+        // Cartesian product {a,b,c} x {1,2,3} = 9 paths, limit = 5
+        S3Util.expandBracePatterns("dir{a,b,c}/file{1,2,3}.csv", 5);
+    }
+
+    @Test
+    public void testExpandBracePatterns_zeroLimitMeansUnlimited() {
+        // maxPaths=0 means no limit (backward compatibility)
+        List<String> result = 
S3Util.expandBracePatterns("file{1,2,3,4,5}.csv", 0);
+        Assert.assertEquals(5, result.size());

Review Comment:
   This test asserts `maxPaths=0` means unlimited, but the new 
`expandBracePatterns(String, int)` Javadoc says `maxPaths` must be > 0 and 
`s3_head_request_max_paths` is documented as a hard cap for HEAD-path 
expansion. Please align the test with the intended semantics (either treat 0 as 
“disable/fallback” like the prior behavior, or update the API/config 
documentation and add safeguards if 0 truly means “unlimited”).
   ```suggestion
       @Test(expected = IllegalArgumentException.class)
       public void testExpandBracePatterns_zeroLimitIsInvalid() {
           // maxPaths must be > 0; zero is an invalid argument
           S3Util.expandBracePatterns("file{1,2,3,4,5}.csv", 0);
   ```



##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/AzureObjStorage.java:
##########
@@ -468,15 +472,15 @@ private Status globListByGetProperties(String bucket, 
String keyPattern,
             List<RemoteFile> result, boolean fileNameOnly, long startTime) {
         try {
             // First expand [...] brackets to {...} braces, then expand {..} 
ranges, then expand braces
+            // Use limit-aware expansion to avoid large allocations before 
checking the limit
             String expandedPattern = S3Util.expandBracketPatterns(keyPattern);
             expandedPattern = S3Util.extendGlobs(expandedPattern);
-            List<String> expandedPaths = 
S3Util.expandBracePatterns(expandedPattern);
-
-            // Fall back to listing if too many paths to avoid overwhelming 
Azure with requests
-            // Controlled by config: s3_head_request_max_paths
-            if (expandedPaths.size() > Config.s3_head_request_max_paths) {
-                LOG.info("Expanded path count {} exceeds limit {}, falling 
back to LIST",
-                        expandedPaths.size(), 
Config.s3_head_request_max_paths);
+            List<String> expandedPaths;
+            try {
+                expandedPaths = S3Util.expandBracePatterns(expandedPattern, 
Config.s3_head_request_max_paths);

Review Comment:
   The new comment says this uses “limit-aware expansion to avoid large 
allocations before checking the limit”, but `S3Util.extendGlobs()` runs first 
and eagerly enumerates numeric ranges into a potentially huge brace string. To 
fully address the OOM/CPU risk for patterns like `{1..100000}`, `extendGlobs` 
(range expansion) likely also needs a limit-aware/short-circuit path keyed off 
`s3_head_request_max_paths`.



##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/AzureObjStorage.java:
##########
@@ -444,11 +446,13 @@ public Status globList(String remotePath, 
List<RemoteFile> result, boolean fileN
             st = new Status(Status.ErrCode.COMMON_ERROR,
                     "errors while glob file " + remotePath + ": " + 
e.getMessage());
         } finally {
-            long endTime = System.nanoTime();
-            long duration = endTime - startTime;
-            LOG.info("process {} elements under prefix {} for {} round, match 
{} elements, take {} micro second",
-                    remotePath, elementCnt, roundCnt, matchCnt,
-                    duration / 1000);
+            if (!usedHeadPath) {
+                long endTime = System.nanoTime();
+                long duration = endTime - startTime;
+                LOG.info("process {} elements under prefix {} for {} round, 
match {} elements, take {} micro second",
+                        remotePath, elementCnt, roundCnt, matchCnt,

Review Comment:
   The log format placeholders don’t match the arguments: the message starts 
with “process {} elements under prefix {}…”, but the first argument passed is 
`remotePath` (a String) and `elementCnt` is second. Swap the first two 
arguments so the element count is logged in the first placeholder and the 
prefix/path in the second.
   ```suggestion
                           elementCnt, remotePath, roundCnt, matchCnt,
   ```



##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/S3ObjStorage.java:
##########
@@ -756,15 +758,15 @@ private GlobListResult globListByHeadRequests(String 
bucket, String keyPattern,
             List<RemoteFile> result, boolean fileNameOnly, long startTime) {
         try {
             // First expand [...] brackets to {...} braces, then expand {..} 
ranges, then expand braces
+            // Use limit-aware expansion to avoid large allocations before 
checking the limit
             String expandedPattern = S3Util.expandBracketPatterns(keyPattern);
             expandedPattern = S3Util.extendGlobs(expandedPattern);
-            List<String> expandedPaths = 
S3Util.expandBracePatterns(expandedPattern);
-
-            // Fall back to listing if too many paths to avoid overwhelming S3 
with HEAD requests
-            // Controlled by config: s3_head_request_max_paths
-            if (expandedPaths.size() > Config.s3_head_request_max_paths) {
-                LOG.info("Expanded path count {} exceeds limit {}, falling 
back to LIST",
-                        expandedPaths.size(), 
Config.s3_head_request_max_paths);
+            List<String> expandedPaths;
+            try {
+                expandedPaths = S3Util.expandBracePatterns(expandedPattern, 
Config.s3_head_request_max_paths);

Review Comment:
   The new comment says this uses “limit-aware expansion to avoid large 
allocations before checking the limit”, but this code still calls 
`S3Util.extendGlobs()` first, which eagerly enumerates numeric ranges (e.g. 
`{1..100000}`) into a large in-memory string/list. If the goal is to mitigate 
OOM/CPU risks, consider making `extendGlobs`/range expansion limit-aware too 
(or short-circuit when the range cardinality would exceed 
`s3_head_request_max_paths`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to