Copilot commented on code in PR #61843:
URL: https://github.com/apache/doris/pull/61843#discussion_r3004429869
##########
fe/fe-core/src/main/java/org/apache/doris/common/util/S3Util.java:
##########
@@ -597,11 +606,30 @@ private static List<Character>
expandBracketContent(String content) {
*/
public static List<String> expandBracePatterns(String pathPattern) {
List<String> result = new ArrayList<>();
- expandBracePatternsRecursive(pathPattern, result);
+ expandBracePatternsRecursive(pathPattern, result, 0);
return result;
}
- private static void expandBracePatternsRecursive(String pattern,
List<String> result) {
+ /**
+ * Expand brace patterns with a limit on the number of expanded paths.
+ * Stops expansion early if the limit is exceeded, avoiding large
allocations.
+ *
+ * @param pathPattern Path with optional brace patterns
+ * @param maxPaths Maximum number of expanded paths allowed (must be > 0)
+ * @return List of expanded concrete paths
+ * @throws BraceExpansionTooLargeException if expansion exceeds maxPaths
+ */
+ public static List<String> expandBracePatterns(String pathPattern, int
maxPaths) {
Review Comment:
`expandBracePatterns(String, int)` Javadoc says `maxPaths` must be > 0, but
the implementation treats 0/negative as “unlimited”. Since callers pass
`Config.s3_head_request_max_paths`, setting that config to 0 (which previously
effectively disabled the HEAD optimization) would now remove the limit and
could trigger unbounded expansion / excessive HEAD requests. Please clarify and
enforce the contract (e.g., reject `maxPaths <= 0` in this overload and keep
“unlimited” only for the 1-arg method, or update the config semantics/docs
accordingly).
```suggestion
* @throws BraceExpansionTooLargeException if expansion exceeds maxPaths
* @throws IllegalArgumentException if {@code maxPaths} is not greater
than zero
*/
public static List<String> expandBracePatterns(String pathPattern, int
maxPaths) {
if (maxPaths <= 0) {
throw new IllegalArgumentException("maxPaths must be > 0");
}
```
##########
fe/fe-core/src/test/java/org/apache/doris/common/util/S3UtilTest.java:
##########
@@ -456,5 +456,46 @@ public void testExpandBracketPatterns_malformedBracket() {
// Malformed bracket (no closing ]) - [ kept as literal
Assert.assertEquals("file[abc.csv",
S3Util.expandBracketPatterns("file[abc.csv"));
}
+
+ // Tests for limit-aware expandBracePatterns
+
+ @Test
+ public void testExpandBracePatterns_withinLimit() {
+ // Expansion within the limit should succeed
+ List<String> result = S3Util.expandBracePatterns("file{1,2,3}.csv",
10);
+ Assert.assertEquals(Arrays.asList("file1.csv", "file2.csv",
"file3.csv"), result);
+ }
+
+ @Test
+ public void testExpandBracePatterns_exactlyAtLimit() {
+ // Expansion exactly at the limit should succeed
+ List<String> result = S3Util.expandBracePatterns("file{1,2,3}.csv", 3);
+ Assert.assertEquals(Arrays.asList("file1.csv", "file2.csv",
"file3.csv"), result);
+ }
+
+ @Test(expected = S3Util.BraceExpansionTooLargeException.class)
+ public void testExpandBracePatterns_exceedsLimit() {
+ // Expansion exceeding the limit should throw
+ S3Util.expandBracePatterns("file{1,2,3,4,5}.csv", 3);
+ }
+
+ @Test(expected = S3Util.BraceExpansionTooLargeException.class)
+ public void testExpandBracePatterns_oneOverLimit() {
+ // maxPaths+1 items must also throw (boundary case)
+ S3Util.expandBracePatterns("file{1,2,3,4}.csv", 3);
+ }
+
+ @Test(expected = S3Util.BraceExpansionTooLargeException.class)
+ public void testExpandBracePatterns_cartesianExceedsLimit() {
+ // Cartesian product {a,b,c} x {1,2,3} = 9 paths, limit = 5
+ S3Util.expandBracePatterns("dir{a,b,c}/file{1,2,3}.csv", 5);
+ }
+
+ @Test
+ public void testExpandBracePatterns_zeroLimitMeansUnlimited() {
+ // maxPaths=0 means no limit (backward compatibility)
+ List<String> result =
S3Util.expandBracePatterns("file{1,2,3,4,5}.csv", 0);
+ Assert.assertEquals(5, result.size());
Review Comment:
This test asserts `maxPaths=0` means unlimited, but the new
`expandBracePatterns(String, int)` Javadoc says `maxPaths` must be > 0 and
`s3_head_request_max_paths` is documented as a hard cap for HEAD-path
expansion. Please align the test with the intended semantics (either treat 0 as
“disable/fallback” like the prior behavior, or update the API/config
documentation and add safeguards if 0 truly means “unlimited”).
```suggestion
@Test(expected = IllegalArgumentException.class)
public void testExpandBracePatterns_zeroLimitIsInvalid() {
// maxPaths must be > 0; zero is an invalid argument
S3Util.expandBracePatterns("file{1,2,3,4,5}.csv", 0);
```
##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/AzureObjStorage.java:
##########
@@ -468,15 +472,15 @@ private Status globListByGetProperties(String bucket,
String keyPattern,
List<RemoteFile> result, boolean fileNameOnly, long startTime) {
try {
// First expand [...] brackets to {...} braces, then expand {..}
ranges, then expand braces
+ // Use limit-aware expansion to avoid large allocations before
checking the limit
String expandedPattern = S3Util.expandBracketPatterns(keyPattern);
expandedPattern = S3Util.extendGlobs(expandedPattern);
- List<String> expandedPaths =
S3Util.expandBracePatterns(expandedPattern);
-
- // Fall back to listing if too many paths to avoid overwhelming
Azure with requests
- // Controlled by config: s3_head_request_max_paths
- if (expandedPaths.size() > Config.s3_head_request_max_paths) {
- LOG.info("Expanded path count {} exceeds limit {}, falling
back to LIST",
- expandedPaths.size(),
Config.s3_head_request_max_paths);
+ List<String> expandedPaths;
+ try {
+ expandedPaths = S3Util.expandBracePatterns(expandedPattern,
Config.s3_head_request_max_paths);
Review Comment:
The new comment says this uses “limit-aware expansion to avoid large
allocations before checking the limit”, but `S3Util.extendGlobs()` runs first
and eagerly enumerates numeric ranges into a potentially huge brace string. To
fully address the OOM/CPU risk for patterns like `{1..100000}`, `extendGlobs`
(range expansion) likely also needs a limit-aware/short-circuit path keyed off
`s3_head_request_max_paths`.
##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/AzureObjStorage.java:
##########
@@ -444,11 +446,13 @@ public Status globList(String remotePath,
List<RemoteFile> result, boolean fileN
st = new Status(Status.ErrCode.COMMON_ERROR,
"errors while glob file " + remotePath + ": " +
e.getMessage());
} finally {
- long endTime = System.nanoTime();
- long duration = endTime - startTime;
- LOG.info("process {} elements under prefix {} for {} round, match
{} elements, take {} micro second",
- remotePath, elementCnt, roundCnt, matchCnt,
- duration / 1000);
+ if (!usedHeadPath) {
+ long endTime = System.nanoTime();
+ long duration = endTime - startTime;
+ LOG.info("process {} elements under prefix {} for {} round,
match {} elements, take {} micro second",
+ remotePath, elementCnt, roundCnt, matchCnt,
Review Comment:
The log format placeholders don’t match the arguments: the message starts
with “process {} elements under prefix {}…”, but the first argument passed is
`remotePath` (a String) and `elementCnt` is second. Swap the first two
arguments so the element count is logged in the first placeholder and the
prefix/path in the second.
```suggestion
elementCnt, remotePath, roundCnt, matchCnt,
```
##########
fe/fe-core/src/main/java/org/apache/doris/fs/obj/S3ObjStorage.java:
##########
@@ -756,15 +758,15 @@ private GlobListResult globListByHeadRequests(String
bucket, String keyPattern,
List<RemoteFile> result, boolean fileNameOnly, long startTime) {
try {
// First expand [...] brackets to {...} braces, then expand {..}
ranges, then expand braces
+ // Use limit-aware expansion to avoid large allocations before
checking the limit
String expandedPattern = S3Util.expandBracketPatterns(keyPattern);
expandedPattern = S3Util.extendGlobs(expandedPattern);
- List<String> expandedPaths =
S3Util.expandBracePatterns(expandedPattern);
-
- // Fall back to listing if too many paths to avoid overwhelming S3
with HEAD requests
- // Controlled by config: s3_head_request_max_paths
- if (expandedPaths.size() > Config.s3_head_request_max_paths) {
- LOG.info("Expanded path count {} exceeds limit {}, falling
back to LIST",
- expandedPaths.size(),
Config.s3_head_request_max_paths);
+ List<String> expandedPaths;
+ try {
+ expandedPaths = S3Util.expandBracePatterns(expandedPattern,
Config.s3_head_request_max_paths);
Review Comment:
The new comment says this uses “limit-aware expansion to avoid large
allocations before checking the limit”, but this code still calls
`S3Util.extendGlobs()` first, which eagerly enumerates numeric ranges (e.g.
`{1..100000}`) into a large in-memory string/list. If the goal is to mitigate
OOM/CPU risks, consider making `extendGlobs`/range expansion limit-aware too
(or short-circuit when the range cardinality would exceed
`s3_head_request_max_paths`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]