xanderbailey commented on code in PR #2416:
URL: https://github.com/apache/iceberg-rust/pull/2416#discussion_r3232244908


##########
crates/iceberg/src/puffin/metadata.rs:
##########
@@ -324,7 +324,11 @@ impl FileMetadata {
                 return FileMetadata::read(input_file).await;
             }
 
-            // Read footer based on prefetchi hint
+            // Validate file header magic
+            let first_four_bytes = 
file_read.read(0..FileMetadata::MAGIC_LENGTH.into()).await?;

Review Comment:
   Confirmed that this matches 
https://github.com/apache/iceberg-rust/pull/2416/changes#diff-767abff39c635c6cdb9182fa8e196ace14cf9598e14a8249db3639c35e162311R281-R284
 and if `prefetch_hint` is less greater than the file length then we hit the 
regular read path which does this validation so this seems correct to me.



##########
crates/iceberg/src/puffin/metadata.rs:
##########
@@ -958,6 +962,33 @@ mod tests {
         assert_eq!(file_metadata, zstd_compressed_metric_file_metadata());
     }
 
+    #[tokio::test]
+    async fn test_read_with_incorrect_header_magic() {
+        let temp_dir = TempDir::new().unwrap();
+
+        let prefetch_hint: u8 = 64;
+        let mut bytes = vec![];
+        // Invalid header magic
+        bytes.extend([0x00, 0x00, 0x00, 0x00]);
+        // Intentionally keep file size larger than prefetch_hint.
+        bytes.extend(vec![0u8; prefetch_hint as usize]);
+        // Valid footer: magic + payload + footer struct
+        bytes.extend(FileMetadata::MAGIC);
+        bytes.extend(empty_footer_payload_bytes());
+        bytes.extend(empty_footer_payload_bytes_length_bytes());
+        bytes.extend(vec![0, 0, 0, 0]); // flags
+        bytes.extend(FileMetadata::MAGIC);
+
+        let input_file = input_file_with_bytes(&temp_dir, &bytes).await;
+
+        assert!(FileMetadata::read(&input_file).await.is_err(),);
+        assert!(
+            FileMetadata::read_with_prefetch(&input_file, prefetch_hint)
+                .await
+                .is_err(),

Review Comment:
   Might be nice to assert on the actual error here rather asserting on any 
error?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to