TheR1sing3un opened a new pull request, #8269:
URL: https://github.com/apache/paimon/pull/8269

   ## Purpose
   
   `RayDatasource` rebuilds the worker-side `TableRead` from the split 
provider's
   `(table, read_type, predicate, limit)` but **not** its `nested_name_paths`. 
As a
   result `read_paimon(..., projection=['payload.a'])` (a nested-leaf 
projection)
   reads every projected leaf as `NULL` — the worker treats the flattened leaf
   name (e.g. `payload_a`) as a missing top-level column. The non-Ray read path 
is
   unaffected because `ReadBuilder.new_read()` already forwards 
`nested_name_paths`
   to `TableRead`.
   
   ## Fix
   
   - `SplitProvider` exposes `nested_name_paths()`: resolved via the read 
builder
     for `CatalogSplitProvider`, carried from the source `TableRead` for
     `PreResolvedSplitProvider` (`TableRead.to_ray`).
   - `RayDatasource` forwards it into the per-task worker `TableRead`.
   
   The change is a no-op for non-nested / top-level-only projections
   (`nested_name_paths` is `None` there), so existing reads are unaffected.
   
   ## Tests
   
   Adds `RayIntegrationTest.test_read_paimon_with_nested_projection`, asserting 
a
   `['id', 'payload.a']` projection returns the real leaf values instead of 
`NULL`.
   
   ## Does this PR introduce a user-facing change?
   
   No.
   
   ## Documentation
   
   No documentation change needed.
   
   ---
   Generative AI disclosure: drafted with AI assistance and reviewed by the 
author.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to