ernestprovo23 opened a new pull request, #49372:
URL: https://github.com/apache/arrow/pull/49372

   ### Rationale
   
   Closes https://github.com/apache/arrow/issues/41365
   
   When `_resolve_filesystem_and_path()` receives an S3 URI containing spaces 
(e.g. `s3://bucket/path with space/file`), the C++ URI parser raises 
`ValueError("Cannot parse URI")`. The Python code previously caught **all** 
`"Cannot parse URI"` errors and silently fell back to `LocalFileSystem`, 
producing a confusing downstream error instead of the real parsing failure.
   
   ### What changed
   
   1. **Added `_is_likely_uri()` helper** — a Python port of the C++ 
`IsLikelyUri()` heuristic from `cpp/src/arrow/filesystem/path_util.cc`. It 
checks whether a string has a valid URI scheme prefix (2-36 chars, RFC 3986 
compliant).
   
   2. **Modified error handling in `_resolve_filesystem_and_path()`** — the 
`except ValueError` block now distinguishes:
      - `"empty scheme"` → still falls back to `LocalFileSystem` (no scheme at 
all)
      - `"Cannot parse URI"` + `_is_likely_uri()` returns `False` → still falls 
back (not a URI, just a local path)
      - `"Cannot parse URI"` + `_is_likely_uri()` returns `True` → 
**re-raises** the error (it IS a URI, just malformed)
   
   3. **Changed `raise e` to `raise`** — preserves the original traceback 
(minor style improvement).
   
   ### Tests added
   
   - `test_is_likely_uri()` — unit tests for the heuristic covering valid 
schemes (`s3://`, `gs://`, `hdfs://`, `abfss://`, `grpc+https://`), local 
paths, Windows drives, and invalid prefixes.
   - `test_resolve_filesystem_and_path_uri_with_spaces()` — verifies 
S3/GCS/ABFSS URIs with spaces now raise `ValueError`.
   - `test_resolve_filesystem_and_path_local_with_spaces()` — verifies local 
paths with spaces still return `LocalFileSystem`.
   
   ### Test plan
   
   - [x] All new tests pass (14/14)
   - [x] Manual verification: `_resolve_filesystem_and_path("s3://bucket/path 
with space")` raises `ValueError`
   - [x] Manual verification: `_resolve_filesystem_and_path("/local/path with 
space")` returns `LocalFileSystem`
   - [ ] Existing `test_fs.py` tests pass (requires full C++ build environment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to