kevinjqliu commented on code in PR #2662:
URL: https://github.com/apache/iceberg-python/pull/2662#discussion_r2491745861


##########
pyiceberg/io/pyarrow.py:
##########


Review Comment:
   i think we should at least check that the parquet field IDs align with the 
Iceberg field IDs



##########
mkdocs/docs/api.md:
##########
@@ -1006,9 +1006,11 @@ Expert Iceberg users may choose to commit existing 
parquet files to the Iceberg
 
 <!-- prettier-ignore-start -->
 
-!!! note "Name Mapping"
-    Because `add_files` uses existing files without writing new parquet files 
that are aware of the Iceberg's schema, it requires the Iceberg's table to have 
a [Name 
Mapping](https://iceberg.apache.org/spec/?h=name+mapping#name-mapping-serialization)
 (The Name mapping maps the field names within the parquet files to the Iceberg 
field IDs). Hence, `add_files` requires that there are no field IDs in the 
parquet file's metadata, and creates a new Name Mapping based on the table's 
current schema if the table doesn't already have one.
-
+!!! note "Name Mapping and Field IDs"
+    `add_files` can work with Parquet files both with and without field IDs in 
their metadata:
+    - **Files with field IDs**: When field IDs are present in the Parquet 
metadata, they must match the corresponding field IDs in the Iceberg table 
schema. This is common for files generated by tools like Spark or when using or 
other libraries with explicit field ID metadata.
+    - **Files without field IDs**: When field IDs are absent, the table must 
have a [Name 
Mapping](https://iceberg.apache.org/spec/?h=name+mapping#name-mapping-serialization)
 to map field names to Iceberg field IDs. `add_files` will automatically create 
a Name Mapping based on the table's current schema if one doesn't already exist.
+    In both cases, a Name Mapping is created if the table doesn't have one, 
ensuring compatibility with various readers.

Review Comment:
   For parquet files with field ID, i dont think we necessary need the name 
mapping if its aligned with the table schema field IDs
   But we can address this separately



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to