Re: [PR] Write Deletion Vectors [iceberg-python]

via GitHub Tue, 09 Dec 2025 18:18:41 -0800


ebyhr commented on code in PR #2822:
URL: https://github.com/apache/iceberg-python/pull/2822#discussion_r2604941480



##########
pyiceberg/table/puffin.py:
##########
@@ -114,3 +147,97 @@ def __init__(self, puffin: bytes) -> None:
 
     def to_vector(self) -> dict[str, "pa.ChunkedArray"]:
         return {path: _bitmaps_to_chunked_array(bitmaps) for path, bitmaps in 
self._deletion_vectors.items()}
+
+
+class PuffinWriter:

Review Comment:
   Iceberg spec recommends writing `created-by` property in Puffin files: 
https://iceberg.apache.org/puffin-spec/#deletion-vector-v1-blob-type. Could you 
please write the property? 



##########
tests/table/test_puffin.py:
##########
@@ -71,3 +71,78 @@ def test_map_high_vals() -> None:
 
     with pytest.raises(ValueError, match="Key 4022190063 is too large, max 
2147483647 to maintain compatibility with Java impl"):
         _ = _deserialize_bitmap(puffin)
+
+
+def test_puffin_round_trip() -> None:
+    # Define some deletion positions for multiple files
+    deletions1 = [10, 20, 30]

Review Comment:
   Iceberg deletion vectors expect 1:1 for DV:path. The current PuffinWriter 
may violate the expectation. 
   
   For instance, the following code writes 2 blobs without merging them. Do you 
fix this behavior as a follow-up, or the caller should provide the merged DV? 
   ```py
       deletions1a = [10, 20]
       deletions1b = [30]
   ...
       writer.add(positions=deletions1a, referenced_data_file=file1_path)
       writer.add(positions=deletions1b, referenced_data_file=file1_path)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Write Deletion Vectors [iceberg-python]

Reply via email to