hilaryRope opened a new pull request, #38099:
URL: https://github.com/apache/beam/pull/38099

   ## Fix GCS filesystem glob matching to handle `/` in object names and 
support `**`
   
   Fixes [#38059](https://github.com/apache/beam/issues/38059)
   
   ### Summary
   
   The GCS filesystem's 
[List()](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go:173:0-216:1)
 method was using 
[filepath.Match()](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/util.go:106:0-113:1)
 to filter objects, which incorrectly treats `/` as a path separator. Since GCS 
object names are flat (with `/` being just another character), patterns like 
`gs://bucket/**` failed to match objects containing `/`.
   
   ### Problem
   
   - 
[filepath.Match](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/util.go:106:0-113:1)
 treats `/` as a path separator, so `*` cannot match across `/`
   - 
[filepath.Match](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/util.go:106:0-113:1)
 does not support `**` for recursive matching
   - `fileio.MatchFiles(scope, "gs://my-bucket/**")` silently excluded objects 
like `dir/subdir/file.txt`
   
   ### Solution
   
   Replaced 
[filepath.Match](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/util.go:106:0-113:1)
 with a custom 
[globToRegex()](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go:42:0-99:1)
 function:
   - `*` → matches any characters except `/` (single path segment)
   - `**` → matches any characters including `/` (recursive)
   - `**/` → matches zero or more path segments
   
   This aligns the Go SDK with the Python and Java SDKs.
   
   ### Testing
   
   Added unit tests for 
[globToRegex()](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go:42:0-99:1)
 and 
[List()](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go:173:0-216:1)
 with nested object names.
   
   ---
   
   - [x] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable.
   - [x] Update 
[CHANGES.md](cci:7://file:///Users/ilariacorda/Documents/Dev/beam/CHANGES.md:0:0-0:0)
 with noteworthy changes.
   - [ ] If this contribution is large, please file an Apache Individual 
Contributor License Agreement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to