TheR1sing3un opened a new pull request, #7801:
URL: https://github.com/apache/paimon/pull/7801

   ## Purpose
   
   Complete #7796: PK tables now serve nested-field projections through the 
merge reader.
   Before this PR, ``with_projection(['mv.latest_version'])`` on a PK table 
whose split
   went through the merge path raised ``NotImplementedError``.
   
   ## Approach
   
   Inner reader keeps the full ROW sub-structure (so deduplicate / 
partial-update /
   aggregation merge functions still see the original row). The new
   ``OuterProjectionRecordReader`` sits above the merge unwrap and walks each
   configured name path on the inner row to emit a flat OffsetRow matching the
   user-visible read schema. This mirrors Java's NestedProjectedRowData split.
   
   ## Commits
   
   1. ``Add OuterProjectionRecordReader for nested-field PK reads`` — the 
wrapper +
      16-case unit test.
   2. ``Fix primary-key read when value projection drops PK columns`` — 
pre-existing
      bug surfaced by this work: the PARQUET/ORC/Lance/Vortex format readers 
used
      merge-internal aliases (``_KEY_id``) to look up DataFields, but the file
      stores PK columns under their bare name (``id``). When projection narrowed
      value fields enough to drop the bare PK from ``self.read_fields``, the
      format reader silently skipped the PK column and the merge KeyValue layout
      raised ``Offset N plus arity M is out of row length L``. Build the lookup
      from both name spaces.
   3. ``Support nested-field projection on primary-key tables`` — wire commit 1
      into ``MergeFileSplitRead`` and remove the guard in ``TableRead``.
   
   ## Tests
   
   - 16 new unit tests for ``OuterProjectionRecordReader`` (None handling, path
     walks, batch reuse contract, row-kind inheritance, empty-input rejection).
   - 3 new e2e tests on PK tables: single sub-path, multiple sub-paths under the
     same struct, mixed top-level and nested ordering.
   - 1 new regression test on PK + value-only projection.
   - All existing append-only nested + reader_primary_key + reader_append_only +
     projection-utility + read-builder tests still pass.
   
   ## Out of scope
   
   - DataEvolution + nested projection (still raises ``NotImplementedError``);
     needs the same widen-then-extract treatment, follow-up PR.
   - ARRAY<ROW> / MAP<X, ROW> nested projection (Java side also keeps 
NestedProjection
     on ROW only).
   - Aggregation merge engine + nested combo regression — depends on
     PR-MERGE-AGGREGATION landing first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to