TheR1sing3un opened a new pull request, #7801:
URL: https://github.com/apache/paimon/pull/7801
## Purpose
Complete #7796: PK tables now serve nested-field projections through the
merge reader.
Before this PR, ``with_projection(['mv.latest_version'])`` on a PK table
whose split
went through the merge path raised ``NotImplementedError``.
## Approach
Inner reader keeps the full ROW sub-structure (so deduplicate /
partial-update /
aggregation merge functions still see the original row). The new
``OuterProjectionRecordReader`` sits above the merge unwrap and walks each
configured name path on the inner row to emit a flat OffsetRow matching the
user-visible read schema. This mirrors Java's NestedProjectedRowData split.
## Commits
1. ``Add OuterProjectionRecordReader for nested-field PK reads`` — the
wrapper +
16-case unit test.
2. ``Fix primary-key read when value projection drops PK columns`` —
pre-existing
bug surfaced by this work: the PARQUET/ORC/Lance/Vortex format readers
used
merge-internal aliases (``_KEY_id``) to look up DataFields, but the file
stores PK columns under their bare name (``id``). When projection narrowed
value fields enough to drop the bare PK from ``self.read_fields``, the
format reader silently skipped the PK column and the merge KeyValue layout
raised ``Offset N plus arity M is out of row length L``. Build the lookup
from both name spaces.
3. ``Support nested-field projection on primary-key tables`` — wire commit 1
into ``MergeFileSplitRead`` and remove the guard in ``TableRead``.
## Tests
- 16 new unit tests for ``OuterProjectionRecordReader`` (None handling, path
walks, batch reuse contract, row-kind inheritance, empty-input rejection).
- 3 new e2e tests on PK tables: single sub-path, multiple sub-paths under the
same struct, mixed top-level and nested ordering.
- 1 new regression test on PK + value-only projection.
- All existing append-only nested + reader_primary_key + reader_append_only +
projection-utility + read-builder tests still pass.
## Out of scope
- DataEvolution + nested projection (still raises ``NotImplementedError``);
needs the same widen-then-extract treatment, follow-up PR.
- ARRAY<ROW> / MAP<X, ROW> nested projection (Java side also keeps
NestedProjection
on ROW only).
- Aggregation merge engine + nested combo regression — depends on
PR-MERGE-AGGREGATION landing first.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]