moriyoshi commented on code in PR #8144:
URL: https://github.com/apache/iceberg/pull/8144#discussion_r1308677645
##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -749,7 +869,16 @@ def _task_to_table(
schema_raw = metadata.get(ICEBERG_SCHEMA)
# TODO: if field_ids are not present, Name Mapping should be
implemented to look them up in the table schema,
# see https://github.com/apache/iceberg/issues/7451
- file_schema = Schema.parse_raw(schema_raw) if schema_raw is not None
else pyarrow_to_schema(physical_schema)
+ file_schema = (
+ Schema.parse_raw(schema_raw)
+ if schema_raw is not None
+ else pyarrow_to_schema(
+ physical_schema,
+ projected_schema,
Review Comment:
It doesn't have much to do with the pruning. What we need to acheive with
`ignore_unprojected_fields` here is to simply ignore redundant columns in the
actual data, and the purpose of pruning is to take away the fields that are
already *known* according to the catalog. Those are similar, but have
different semantics.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]