Re: [PR] [python] Add self-contained Ray datasource and top-level read_paimon/write_paimon API [paimon]

via GitHub Thu, 30 Apr 2026 00:22:45 -0700


TheR1sing3un commented on PR #7740:
URL: https://github.com/apache/paimon/pull/7740#issuecomment-4350454594


   Thanks for the review! Addressed all three comments in c4e86cd0:
   
   1. **`_ensure_planned()` helper** — `splits` and `read_type` properties now 
share a single entry point that runs the `ReadBuilder` plan once and populates 
both fields together, instead of each property doing its own `if x is None: 
self._plan()` check.
   
   2. **`_from_table_read` no longer bypasses `__init__`** — added a private 
`_resolved=(table, splits, read_type)` sentinel parameter to `__init__`. When 
supplied, the catalog/identifier path is skipped and the pre-resolved values 
are used directly. `_from_table_read` now forwards through `__init__`, so any 
future field added to `__init__` is automatically initialized for both 
construction paths. Also added validation: `table_identifier` and 
`catalog_options` are required when `_resolved is None`.
   
   3. **`limit` test case** — added `test_read_paimon_with_limit` to 
`ray_integration_test.py`. Writes 10 rows across two partitions (forces two 
raw-convertible splits) and asserts `limit=3` causes the scan to drop the 
second split (Ray Dataset row count < 10), with the full unbounded read as 
sanity baseline. The assertion uses `< 10` rather than exact `== N` because 
Paimon's scan-time limit is per-split (whole-split granularity at this layer); 
row-exact short-circuiting in the reader is a separate follow-up.
   
   Tests: `pypaimon/tests/ray_integration_test.py` 9/9 pass, flake8 clean. 
Ready for re-review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [python] Add self-contained Ray datasource and top-level read_paimon/write_paimon API [paimon]

Reply via email to