Torisels opened a new issue, #445:
URL: https://github.com/apache/hudi-rs/issues/445
### Is there an existing issue for this?
- [x] I have searched the existing issues
### Description of the bug
On Windows, the library uses `PathBuf` for joining paths, which results in
paths with backslashes (`\`). This causes problems when interacting with cloud
storage APIs or POSIX-style APIs that expect forward slashes (`/`) as the path
separator. As a result, file operations may fail, or the generated paths may
not be accepted by external systems. There is no documented workaround or
environment variable to force forward slash normalization for cross-platform
compatibility.
### Steps To Reproduce
1. Use python hudi to access some hudi table in S3 on Windows.
2. Run example code:
```python
from hudi import HudiTableBuilder
import pyarrow as pa
hudi_table =
HudiTableBuilder.from_base_uri("s3://path/to_your_hudi/").build()
batches = hudi_table.read_snapshot(filters=[("city", "=", "san_francisco")])
# convert to PyArrow table
arrow_table = pa.Table.from_batches(batches)
result = arrow_table.select(["rider", "city", "ts", "fare"])
print(result)
```
3. Observe s3 error: path to .hoodie snapshot will be concatenated with
backslash
### Expected behavior
Paths should always use forward slashes (`/`) when interacting with external
APIs, regardless of the underlying OS. There should be a documented way (e.g.,
environment variable or API option) to normalize paths for cross-platform
compatibility.
### Screenshots / Logs
Storage error: Object at location
my_table/.hoodie\20250909105111479.deltacommit not found: Client error with
status 404 Not Found: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not
exist.</Message><Key>my_table/.hoodie\20250909105111479.deltacommit</Key><RequestId>XXXX</RequestId><HostId>YYYY</HostId></Error>
### Software information
- Operating system: Windows
- Project version: 0.4.0
### Additional context
This bug can cause major interoperability problems for users working on
Windows, especially with cloud storage (S3, GCS, etc.). Please consider adding
a normalization step or a config option/environment variable to always use
POSIX-style paths when interacting with external systems.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]