leaves12138 commented on code in PR #7801:
URL: https://github.com/apache/paimon/pull/7801#discussion_r3214212014
##########
paimon-python/pypaimon/read/split_read.py:
##########
@@ -189,6 +189,15 @@ def file_reader_supplier(self, file: DataFileMeta,
for_merge_read: bool,
nested_path_by_name = self._nested_path_by_name()
has_nested = nested_path_by_name is not None
+ # Cover both the merge-internal aliases (``_KEY_id``) and the
+ # bare user-facing PK name (``id``) the file actually stores.
+ name_to_field: Dict[str, DataField] = {f.name: f for f in
self.read_fields}
Review Comment:
This alias-safe lookup is used below for Lance/Vortex/Parquet/ORC, but the
Avro branch still passes `self.read_fields` as `full_fields`. For a PK merge
read with `with_projection(['behavior'])` (or
`with_projection(['mv.latest_version'])` on an Avro PK table),
`_get_final_read_data_fields()` asks the file reader for the bare PK name such
as `user_id`, while `FormatAvroReader` builds `full_fields_map` from
`self.read_fields`, which only contains the merge alias `_KEY_user_id` when the
PK was dropped from the value projection. That raises `KeyError: 'user_id'`
before the merge reader can run.
Please apply the same bare-PK alias handling to the Avro path too (for
example by passing a full field list that also contains the trimmed bare PK
fields, or by letting `FormatAvroReader` receive resolved `DataField`s) and add
an Avro regression test for PK value-only projection / PK nested projection.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]