EslamAhmed171 opened a new issue, #48325:
URL: https://github.com/apache/arrow/issues/48325
### Describe the bug, including details regarding any error messages,
version, and platform.
### Describe the bug
I'm running PyArrow tests on Windows with a username that contains a space
("Eslam Ahmed"), and I'm encountering **9 test failures**. Upon investigation,
**8 failures** are caused by the `_filesystem_uri()` helper function in
`pyarrow/tests/util.py`, and **1 failure** is caused by similar manual URI
construction in a test file.
**Example error:**
```
pyarrow.lib.ArrowInvalid: Cannot parse URI: 'file:///C:/Users/Eslam
Ahmed/AppData/Local/Temp/pytest-of-Eslam Ahmed/pytest-2/test_filesystem_uri0'
due to syntax error at character ' ' (position 22)
```
### Root Cause
The issue is in the `_filesystem_uri()` function in `pyarrow/tests/util.py`
which doesn't URL-encode spaces:
**Current problematic code:**
```python
def _filesystem_uri(path):
# URIs on Windows must follow 'file:///C:...' or 'file:/C:...' patterns.
if os.name == 'nt':
uri = f'file:///{path}'
else:
uri = f'file://{path}'
return uri
```
This creates URIs like `file:///C:/Users/Eslam Ahmed/test` instead of
`file:///C:/Users/Eslam%20Ahmed/test`, violating RFC 3986 which requires spaces
to be percent-encoded.
### Impact
This affects:
1. Users with spaces in their Windows usernames
2. Multiple test failures (8+ tests fail when username contains spaces)
3. Any code using `_filesystem_uri()` with paths containing special
characters
### Proposed Solution
Use Python's built-in `Path.as_uri()` which handles proper percent-encoding:
**For `_filesystem_uri()` in `pyarrow/tests/util.py`:**
```python
from pathlib import Path
def _filesystem_uri(path):
return Path(path).as_uri()
```
This automatically handles:
- Proper percent-encoding of spaces and special characters
- Correct URI scheme for both Windows and Unix
- Cross-platform compatibility
**For the manual URI construction in
`pyarrow/tests/parquet/test_metadata.py` (line ~690):**
```python
# From:
file_uri = 'file:///' + file_path
# To:
from pathlib import Path
file_uri = Path(file_path).as_uri()
```
### Environment
- **OS**: Windows 11
- **PyArrow version**: main branch (development build)
- **Python version**: 3.13
I'm happy to submit a PR with the fix if this approach looks good to the
maintainers.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]