[GitHub] [iceberg] moriyoshi commented on a diff in pull request #8144: Python: allow projection of Iceberg fields to pyarrow table schema with names

via GitHub Tue, 29 Aug 2023 04:34:22 -0700


moriyoshi commented on code in PR #8144:
URL: https://github.com/apache/iceberg/pull/8144#discussion_r1308677645



##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -749,7 +869,16 @@ def _task_to_table(
             schema_raw = metadata.get(ICEBERG_SCHEMA)
         # TODO: if field_ids are not present, Name Mapping should be 
implemented to look them up in the table schema,
         #  see https://github.com/apache/iceberg/issues/7451
-        file_schema = Schema.parse_raw(schema_raw) if schema_raw is not None 
else pyarrow_to_schema(physical_schema)
+        file_schema = (
+            Schema.parse_raw(schema_raw)
+            if schema_raw is not None
+            else pyarrow_to_schema(
+                physical_schema,
+                projected_schema,

Review Comment:
   It doesn't have much to do with the pruning. What we need to acheive with 
`ignore_unprojected_fields` here is to simply ignore redundant columns in the 
actual data, and the purpose of pruning is to take away the fields that are 
already *known* according to the catalog.  Those are similar, but have 
different semantics.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] moriyoshi commented on a diff in pull request #8144: Python: allow projection of Iceberg fields to pyarrow table schema with names

Reply via email to