[PR] [python][ray] Preserve schema for empty reads [paimon]

via GitHub Thu, 04 Jun 2026 00:47:35 -0700


QuakeWang opened a new pull request, #8118:
URL: https://github.com/apache/paimon/pull/8118


   ### Purpose
   
   The top-level Ray `read_paimon` API planned reads through `RayDatasource`. 
When a table scan produced no splits, `RayDatasource.get_read_tasks()` returned 
no read tasks, so Ray could create an empty dataset without the Paimon table 
schema.
   
   This was inconsistent with `TableRead.to_ray()`, which already returns an 
empty Arrow-backed Ray dataset with the planned read schema.
   
   This PR makes `read_paimon` use the planned `read_type` to build an empty 
Arrow table when there are no splits, so empty reads preserve schema and 
projection. It also lazily imports `ray.data` and reports an actionable 
`pypaimon[ray]` install hint when Ray is missing.
   
   ### Tests
   
   - `python -m pytest 
paimon-python/pypaimon/tests/ray_integration_test.py::RayIntegrationTest -q`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [python][ray] Preserve schema for empty reads [paimon]

Reply via email to