EslamAhmed171 opened a new issue, #48325:
URL: https://github.com/apache/arrow/issues/48325

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   ### Describe the bug
   
   I'm running PyArrow tests on Windows with a username that contains a space 
("Eslam Ahmed"), and I'm encountering **9 test failures**. Upon investigation, 
**8 failures** are caused by the `_filesystem_uri()` helper function in 
`pyarrow/tests/util.py`, and **1 failure** is caused by similar manual URI 
construction in a test file.
   
   **Example error:**
   ```
   pyarrow.lib.ArrowInvalid: Cannot parse URI: 'file:///C:/Users/Eslam 
Ahmed/AppData/Local/Temp/pytest-of-Eslam Ahmed/pytest-2/test_filesystem_uri0' 
   due to syntax error at character ' ' (position 22)
   ```
   
   ### Root Cause
   
   The issue is in the `_filesystem_uri()` function in `pyarrow/tests/util.py` 
which doesn't URL-encode spaces:
   
   **Current problematic code:**
   ```python
   def _filesystem_uri(path):
       # URIs on Windows must follow 'file:///C:...' or 'file:/C:...' patterns.
       if os.name == 'nt':
           uri = f'file:///{path}'
       else:
           uri = f'file://{path}'
       return uri
   ```
   
   This creates URIs like `file:///C:/Users/Eslam Ahmed/test` instead of 
`file:///C:/Users/Eslam%20Ahmed/test`, violating RFC 3986 which requires spaces 
to be percent-encoded.
   
   ### Impact
   
   This affects:
   1. Users with spaces in their Windows usernames
   2. Multiple test failures (8+ tests fail when username contains spaces)
   3. Any code using `_filesystem_uri()` with paths containing special 
characters
   
   ### Proposed Solution
   
   Use Python's built-in `Path.as_uri()` which handles proper percent-encoding:
   
   **For `_filesystem_uri()` in `pyarrow/tests/util.py`:**
   ```python
   from pathlib import Path
   
   def _filesystem_uri(path):
       return Path(path).as_uri()
   ```
   
   This automatically handles:
   - Proper percent-encoding of spaces and special characters
   - Correct URI scheme for both Windows and Unix
   - Cross-platform compatibility
   
   **For the manual URI construction in 
`pyarrow/tests/parquet/test_metadata.py` (line ~690):**
   ```python
   # From:
   file_uri = 'file:///' + file_path
   
   # To:
   from pathlib import Path
   file_uri = Path(file_path).as_uri()
   ```
   
   ### Environment
   
   - **OS**: Windows 11
   - **PyArrow version**: main branch (development build)
   - **Python version**: 3.13
   
   I'm happy to submit a PR with the fix if this approach looks good to the 
maintainers.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to