TheR1sing3un opened a new pull request, #8269:
URL: https://github.com/apache/paimon/pull/8269
## Purpose
`RayDatasource` rebuilds the worker-side `TableRead` from the split
provider's
`(table, read_type, predicate, limit)` but **not** its `nested_name_paths`.
As a
result `read_paimon(..., projection=['payload.a'])` (a nested-leaf
projection)
reads every projected leaf as `NULL` — the worker treats the flattened leaf
name (e.g. `payload_a`) as a missing top-level column. The non-Ray read path
is
unaffected because `ReadBuilder.new_read()` already forwards
`nested_name_paths`
to `TableRead`.
## Fix
- `SplitProvider` exposes `nested_name_paths()`: resolved via the read
builder
for `CatalogSplitProvider`, carried from the source `TableRead` for
`PreResolvedSplitProvider` (`TableRead.to_ray`).
- `RayDatasource` forwards it into the per-task worker `TableRead`.
The change is a no-op for non-nested / top-level-only projections
(`nested_name_paths` is `None` there), so existing reads are unaffected.
## Tests
Adds `RayIntegrationTest.test_read_paimon_with_nested_projection`, asserting
a
`['id', 'payload.a']` projection returns the real leaf values instead of
`NULL`.
## Does this PR introduce a user-facing change?
No.
## Documentation
No documentation change needed.
---
Generative AI disclosure: drafted with AI assistance and reviewed by the
author.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]