github-actions[bot] commented on code in PR #64684:
URL: https://github.com/apache/doris/pull/64684#discussion_r3450914756


##########
fe/fe-filesystem/fe-filesystem-spi/src/main/java/org/apache/doris/filesystem/spi/S3CompatibleFileSystem.java:
##########
@@ -847,6 +849,247 @@ protected static String longestNonGlobPrefix(String 
globPattern) {
         return globPattern.substring(0, earliest);
     }
 
+    /**
+     * Returns object-store list prefixes that are safe to push down for a 
glob pattern.
+     *
+     * <p>Unlike {@link #longestNonGlobPrefix(String)}, this expands bounded 
glob constructs
+     * ({@code {...}} alternation and positive {@code [...]} character 
classes) before the first
+     * unbounded wildcard. That lets patterns such as
+     * {@code date=2025-{0[3-9],1[0-2]}-01/mp_id=8/*} list the concrete 
date/mp prefixes instead
+     * of scanning everything under {@code date=2025-}. If expansion would be 
too large or a glob
+     * construct is not safely enumerable, it falls back to the conservative 
longest static prefix.
+     */
+    protected static List<String> expandedGlobListPrefixes(String globPattern) 
{
+        List<String> prefixes = expandGlobListPrefixes(globPattern, true);
+        return prefixes == null ? List.of(longestNonGlobPrefix(globPattern)) : 
prefixes;
+    }
+
+    private static List<String> expandGlobListPrefixes(String globPattern, 
boolean allowPartialPrefix) {
+        List<String> prefixes = new ArrayList<>();
+        prefixes.add("");
+        int i = 0;
+        while (i < globPattern.length()) {
+            char c = globPattern.charAt(i);
+            if (c == '*' || c == '?') {

Review Comment:
   This recursive expansion is only safe when the brace arm is fully 
enumerable. Right now `expandGlobListPrefixes(alternative, false)` still 
returns a partial prefix when it encounters `*` or `?` (lines 873-875), and 
then the caller appends the suffix after the brace. For example, 
`data/{foo*,bar*}/part.parquet` produces list prefixes like 
`data/foo/part.parquet` and `data/bar/part.parquet`, so objects such as 
`data/foobar/part.parquet` match the glob regex but are never listed. If an 
alternative hits an unbounded wildcard while `allowPartialPrefix` is false, the 
brace expansion needs to fail and fall back to the conservative outer prefix 
instead of appending the outer suffix to that partial arm.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to