chenzl25 opened a new pull request, #2513: URL: https://github.com/apache/iceberg-rust/pull/2513
## What Fix deletion vector reads from Puffin files by using the manifest-provided blob range for direct access. For Puffin position delete files, Iceberg manifest entries carry `content_offset`, `content_size_in_bytes`, and `referenced_data_file`. These identify the deletion-vector blob inside the Puffin file. The reader now uses that range directly instead of first parsing the Puffin footer. ## Why The previous path tried to parse Puffin file metadata before reading the deletion-vector blob. That fails when the read path is already scoped to the DV blob range, because the blob payload does not start with the Puffin file magic `PFA1`. This could produce errors like: ```text Bad magic value: [1, 0, 0, 0] should be [80, 70, 65, 49] ``` Spark handles deletion vectors through the manifest-provided blob offset/size, so this aligns iceberg-rust with the Iceberg direct-access model for deletion vectors. Changes: - Add referenced_data_file to FileScanTask. - Propagate referenced_data_file from delete manifest entries into scan tasks. - For Puffin deletion vectors, read content_offset..content_offset + content_size_in_bytes directly. - Construct a deletion-vector-v1 blob from the direct range and parse it with DeleteVector::from_puffin_blob. - Keep the existing Puffin footer parsing path as fallback when referenced_data_file is unavailable. - Use path#offset:length as the positional delete load key for Puffin files, so multiple DV blobs in one Puffin file are handled independently. - Add a focused test covering direct blob-range reads from a real Puffin file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
