mbutrovich opened a new pull request, #1804:
URL: https://github.com/apache/iceberg-rust/pull/1804

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123.
   -->
   
   - Partially address #1749. Just gonna copy my comments from the test:
   ```
       /// This reproduces the scenario from Iceberg Java's 
TestAddFilesProcedure where:
       /// - Hive-style partitioned Parquet files are imported via add_files 
procedure
       /// - Parquet files have field IDs: name (1), subdept (2)
       /// - Iceberg schema assigns different field IDs: id (1), name (2), dept 
(3), subdept (4)
       /// - Partition columns (id, dept) have initial_default values from 
manifests
       ///
       /// Without proper handling, this would incorrectly:
       /// 1. Try to read partition column "id" (field_id=1) from Parquet 
field_id=1 ("name")
       /// 2. Read data column "name" (field_id=2) from Parquet field_id=2 
("subdept")
       ///
       /// The fix ensures:
       /// 1. Partition columns with initial_default are ALWAYS read as 
constants (never from Parquet)
       /// 2. Data columns use name-based mapping when field ID conflicts are 
detected
   ```
   
   ## What changes are included in this PR?
   
   <!--
   Provide a summary of the modifications in this PR. List the main changes 
such as new features, bug fixes, refactoring, or any other updates.
   -->
   
   - Detect conflict in field ID mappings and resolve similar to Iceberg Java 
`BaseParquetReaders.java PartitionUtil.constantsMap()`
   
   ## Are these changes tested?
   
   <!--
   Specify what test covers (unit test, integration test, etc.).
   
   If tests are not included in your PR, please explain why (for example, are 
they covered by existing tests)?
   -->
   
   - New test `add_files_partition_columns_with_field_id_conflict`
   - This fixed 42 tests in Iceberg Java's spark-extensions 
`TestAddFilesProcedure` suite when running with Comet's 
https://github.com/apache/datafusion-comet/pull/2528.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to