Torisels opened a new issue, #445:
URL: https://github.com/apache/hudi-rs/issues/445

   ### Is there an existing issue for this?
   
   - [x] I have searched the existing issues
   
   ### Description of the bug
   
   On Windows, the library uses `PathBuf` for joining paths, which results in 
paths with backslashes (`\`). This causes problems when interacting with cloud 
storage APIs or POSIX-style APIs that expect forward slashes (`/`) as the path 
separator. As a result, file operations may fail, or the generated paths may 
not be accepted by external systems. There is no documented workaround or 
environment variable to force forward slash normalization for cross-platform 
compatibility.
   
   ### Steps To Reproduce
   
   1. Use python hudi to access some hudi table in S3 on Windows.
   
   2. Run example code:
   ```python
   from hudi import HudiTableBuilder
   import pyarrow as pa
   
   hudi_table = 
HudiTableBuilder.from_base_uri("s3://path/to_your_hudi/").build()
   batches = hudi_table.read_snapshot(filters=[("city", "=", "san_francisco")])
   
   # convert to PyArrow table
   arrow_table = pa.Table.from_batches(batches)
   result = arrow_table.select(["rider", "city", "ts", "fare"])
   print(result)
   ```
   3. Observe s3 error: path to .hoodie snapshot will be concatenated with 
backslash
   
   ### Expected behavior
   
   Paths should always use forward slashes (`/`) when interacting with external 
APIs, regardless of the underlying OS. There should be a documented way (e.g., 
environment variable or API option) to normalize paths for cross-platform 
compatibility.
   
   
   ### Screenshots / Logs
   
   Storage error: Object at location
   my_table/.hoodie\20250909105111479.deltacommit not found: Client error with 
status 404 Not Found: <?xml version="1.0" encoding="UTF-8"?>
   <Error><Code>NoSuchKey</Code><Message>The specified key does not
   
exist.</Message><Key>my_table/.hoodie\20250909105111479.deltacommit</Key><RequestId>XXXX</RequestId><HostId>YYYY</HostId></Error>
   
   ### Software information
   
   - Operating system: Windows
   - Project version: 0.4.0
   
   
   ### Additional context
   
   This bug can cause major interoperability problems for users working on 
Windows, especially with cloud storage (S3, GCS, etc.). Please consider adding 
a normalization step or a config option/environment variable to always use 
POSIX-style paths when interacting with external systems.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to