TheR1sing3un commented on code in PR #7801:
URL: https://github.com/apache/paimon/pull/7801#discussion_r3214245880


##########
paimon-python/pypaimon/read/split_read.py:
##########
@@ -189,6 +189,15 @@ def file_reader_supplier(self, file: DataFileMeta, 
for_merge_read: bool,
         nested_path_by_name = self._nested_path_by_name()
         has_nested = nested_path_by_name is not None
 
+        # Cover both the merge-internal aliases (``_KEY_id``) and the
+        # bare user-facing PK name (``id``) the file actually stores.
+        name_to_field: Dict[str, DataField] = {f.name: f for f in 
self.read_fields}

Review Comment:
   > This alias-safe lookup is used below for Lance/Vortex/Parquet/ORC, but the 
Avro branch still passes `self.read_fields` as `full_fields`. For a PK merge 
read with `with_projection(['behavior'])` (or 
`with_projection(['mv.latest_version'])` on an Avro PK table), 
`_get_final_read_data_fields()` asks the file reader for the bare PK name such 
as `user_id`, while `FormatAvroReader` builds `full_fields_map` from 
`self.read_fields`, which only contains the merge alias `_KEY_user_id` when the 
PK was dropped from the value projection. That raises `KeyError: 'user_id'` 
before the merge reader can run.
   > 
   > Please apply the same bare-PK alias handling to the Avro path too (for 
example by passing a full field list that also contains the trimmed bare PK 
fields, or by letting `FormatAvroReader` receive resolved `DataField`s) and add 
an Avro regression test for PK value-only projection / PK nested projection.
   
   Nice catch! Already fixed. Please review it again. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to