Re: [PR] [python] Support schema evolution of nested struct sub-fields [paimon]

via GitHub Wed, 10 Jun 2026 00:45:41 -0700


TheR1sing3un commented on code in PR #8187:
URL: https://github.com/apache/paimon/pull/8187#discussion_r3386423297



##########
paimon-python/pypaimon/read/reader/data_file_batch_reader.py:
##########
@@ -57,55 +59,99 @@ def __init__(self, format_reader: RecordBatchReader, 
index_mapping: List[int], p
         self.file_io = file_io
         # Per-file field-id normalization: map the physically-read columns
         # (the file's own field order/names) onto the latest read target by
-        # field id, padding missing ids with NULL. ``None`` when there is no
-        # evolution to reconcile (identity) -- the common path stays zero-copy.
-        self._normalize_positions, self._normalize_names = \
-            self._build_normalize_plan(file_data_fields, target_data_fields)
+        # field id, padding missing ids with NULL and recursing into nested
+        # ROW / ARRAY<ROW> / MAP<.,ROW> sub-fields the same way. ``None`` when
+        # there is no evolution to reconcile -- the common path stays 
zero-copy.
+        self._normalize_plan = self._build_normalize_plan(file_data_fields, 
target_data_fields)

Review Comment:
   > The new nested field-id normalization is skipped for dotted nested 
projections. SplitRead.file_reader_supplier passes file_data_fields=None and 
target_data_fields=None whenever has_nested is true, so 
with_projection(['mv.renamed_leaf']) still reads by the new physical name from 
old files. I reproduced rename mv.s -> ss followed by projection ['id', 
'mv.ss']: the old row returned mv_ss=None instead of 'a'. A nested type change 
is worse: projecting the evolved leaf kept old batches as int32 and new batches 
as int64, causing pyarrow.lib.ArrowInvalid during concatenation.
   
   Fixed in b4d46ecc3.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [python] Support schema evolution of nested struct sub-fields [paimon]

Reply via email to