dannycjones commented on code in PR #2367:
URL: https://github.com/apache/iceberg-rust/pull/2367#discussion_r3312416700


##########
crates/iceberg/src/transaction/snapshot.rs:
##########
@@ -375,11 +457,50 @@ impl<'a> SnapshotProducer<'a> {
 
         summary_collector.set_partition_summary_limit(partition_summary_limit);
 
+        // Helper: look up the partition spec for a file. Returns DataInvalid
+        // if the file references a spec that doesn't exist in the table.
+        let spec_for = |data_file: &DataFile| {
+            table_metadata
+                .partition_spec_by_id(data_file.partition_spec_id)
+                .cloned()
+                .ok_or_else(|| {
+                    Error::new(
+                        ErrorKind::DataInvalid,
+                        format!(
+                            "Cannot find partition spec {} for file: {}",
+                            data_file.partition_spec_id,
+                            data_file.file_path()
+                        ),
+                    )
+                })
+        };
+
         for data_file in &self.added_data_files {
             summary_collector.add_file(
                 data_file,
                 table_metadata.current_schema().clone(),
-                table_metadata.default_partition_spec().clone(),
+                spec_for(data_file)?,
+            );
+        }
+        for delete_file in &self.added_delete_files {
+            summary_collector.add_file(
+                delete_file,
+                table_metadata.current_schema().clone(),
+                spec_for(delete_file)?,
+            );
+        }
+        for data_file in &self.removed_data_files {
+            summary_collector.remove_file(
+                data_file,
+                table_metadata.current_schema().clone(),
+                spec_for(data_file)?,
+            );
+        }
+        for delete_file in &self.removed_delete_files {
+            summary_collector.remove_file(
+                delete_file,
+                table_metadata.current_schema().clone(),
+                spec_for(delete_file)?,
             );
         }

Review Comment:
   I think fixing this will be an involved change, assuming it is wrong. Since 
the stats are optional, shall we omit for now and open an issue to follow-up?
   
   Looking at the Java implementation, [it stores a reference to the schema for 
a given partition 
spec](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L57).
 Perhaps we should consider doing the same. This would allow us to always be 
able to pull up the associated partitions spec, and I'm hoping that it would 
simplify things a bit since we would not need to pass in a schema to the 
snapshot summary collector.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to