hilaryRope opened a new pull request, #38099: URL: https://github.com/apache/beam/pull/38099
## Fix GCS filesystem glob matching to handle `/` in object names and support `**` Fixes [#38059](https://github.com/apache/beam/issues/38059) ### Summary The GCS filesystem's [List()](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go:173:0-216:1) method was using [filepath.Match()](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/util.go:106:0-113:1) to filter objects, which incorrectly treats `/` as a path separator. Since GCS object names are flat (with `/` being just another character), patterns like `gs://bucket/**` failed to match objects containing `/`. ### Problem - [filepath.Match](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/util.go:106:0-113:1) treats `/` as a path separator, so `*` cannot match across `/` - [filepath.Match](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/util.go:106:0-113:1) does not support `**` for recursive matching - `fileio.MatchFiles(scope, "gs://my-bucket/**")` silently excluded objects like `dir/subdir/file.txt` ### Solution Replaced [filepath.Match](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/util.go:106:0-113:1) with a custom [globToRegex()](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go:42:0-99:1) function: - `*` → matches any characters except `/` (single path segment) - `**` → matches any characters including `/` (recursive) - `**/` → matches zero or more path segments This aligns the Go SDK with the Python and Java SDKs. ### Testing Added unit tests for [globToRegex()](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go:42:0-99:1) and [List()](cci:1://file:///Users/ilariacorda/Documents/Dev/beam/sdks/go/pkg/beam/io/filesystem/gcs/gcs.go:173:0-216:1) with nested object names. --- - [x] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. - [x] Update [CHANGES.md](cci:7://file:///Users/ilariacorda/Documents/Dev/beam/CHANGES.md:0:0-0:0) with noteworthy changes. - [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
