ernestprovo23 opened a new pull request, #49372: URL: https://github.com/apache/arrow/pull/49372
### Rationale Closes https://github.com/apache/arrow/issues/41365 When `_resolve_filesystem_and_path()` receives an S3 URI containing spaces (e.g. `s3://bucket/path with space/file`), the C++ URI parser raises `ValueError("Cannot parse URI")`. The Python code previously caught **all** `"Cannot parse URI"` errors and silently fell back to `LocalFileSystem`, producing a confusing downstream error instead of the real parsing failure. ### What changed 1. **Added `_is_likely_uri()` helper** — a Python port of the C++ `IsLikelyUri()` heuristic from `cpp/src/arrow/filesystem/path_util.cc`. It checks whether a string has a valid URI scheme prefix (2-36 chars, RFC 3986 compliant). 2. **Modified error handling in `_resolve_filesystem_and_path()`** — the `except ValueError` block now distinguishes: - `"empty scheme"` → still falls back to `LocalFileSystem` (no scheme at all) - `"Cannot parse URI"` + `_is_likely_uri()` returns `False` → still falls back (not a URI, just a local path) - `"Cannot parse URI"` + `_is_likely_uri()` returns `True` → **re-raises** the error (it IS a URI, just malformed) 3. **Changed `raise e` to `raise`** — preserves the original traceback (minor style improvement). ### Tests added - `test_is_likely_uri()` — unit tests for the heuristic covering valid schemes (`s3://`, `gs://`, `hdfs://`, `abfss://`, `grpc+https://`), local paths, Windows drives, and invalid prefixes. - `test_resolve_filesystem_and_path_uri_with_spaces()` — verifies S3/GCS/ABFSS URIs with spaces now raise `ValueError`. - `test_resolve_filesystem_and_path_local_with_spaces()` — verifies local paths with spaces still return `LocalFileSystem`. ### Test plan - [x] All new tests pass (14/14) - [x] Manual verification: `_resolve_filesystem_and_path("s3://bucket/path with space")` raises `ValueError` - [x] Manual verification: `_resolve_filesystem_and_path("/local/path with space")` returns `LocalFileSystem` - [ ] Existing `test_fs.py` tests pass (requires full C++ build environment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
