leaves12138 commented on code in PR #8014:
URL: https://github.com/apache/paimon/pull/8014#discussion_r3317181482


##########
paimon-python/pypaimon/read/split_read.py:
##########
@@ -268,10 +272,19 @@ def file_reader_supplier(self, file: DataFileMeta, 
for_merge_read: bool,
                 ordered_read_fields, read_arrow_predicate, 
batch_size=batch_size,
                 options=self.table.options,
                 nested_name_paths=ordered_nested_paths)
+        elif file_format == CoreOptions.FILE_FORMAT_ROW:
+            if has_nested:
+                raise NotImplementedError(
+                    "Nested-field projection is not supported on ROW files")
+            format_reader = FormatRowReader(
+                self.table.file_io, file_path, read_file_fields,
+                list(name_to_field.values()),

Review Comment:
   `FormatRowReader` decodes each row by physical field order, so it needs the 
complete physical ROW schema, not only the requested fields. Here `full_fields` 
is built from `name_to_field`, which only contains the read/projection fields 
plus trimmed lookup fields. For example, with a ROW append table `(id INT, name 
STRING, val DOUBLE)`, `with_projection(["id", "val"])` passes only `[id, val]`; 
`_decode_block` then decodes the bytes of `name` as `val` and returns corrupted 
doubles. Please pass the full file schema in physical order to 
`FormatRowReader` (while keeping `read_file_fields` as the projected columns).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to