laskoviymishka opened a new issue, #997:
URL: https://github.com/apache/iceberg-go/issues/997

   Parent: #589
   
   Depends on #866 (DV bitmap reader landing first to settle the codec). v3 
prefers DVs over Parquet position-delete files. Iceberg-go would be the first 
non-Java client with a DV writer — pyiceberg writes Parquet position-deletes 
today, and iceberg-rust hasn't shipped this either.
   
   Shape:
   
   ```go
   // New file: table/internal/dv_writer.go
   type DVWriter struct {
       fs    io.IO
       blobs []puffin.Blob
   }
   
   func (w *DVWriter) Add(dataFilePath string, positions []int64) error
   func (w *DVWriter) Flush(ctx context.Context, location string) (DataFile, 
error)
   ```
   
   Serializes a Roaring 64-bit bitmap → Puffin blob 
(`blob_type=deletion-vector-v1`) → manifest entry with `content=1, 
file_format=PUFFIN, ReferencedDataFile, ContentOffset, ContentSizeInBytes`. 
Commits through the existing `RowDelta.AddDeletes()` API so the rest of the 
producer stack is unchanged.
   
   Add a table property `write.delete.format=position|dv`. On v3 the default 
flips to `dv`; v2 stays on `position`. When `dv`, the existing position-delete 
writer in `table/internal/parquet_files.go` is bypassed in favor of `DVWriter`.
   
   Scope is large enough that it should land across multiple PRs — roaring 
serialization + writer skeleton, then producer wiring + property gating, then 
cross-client tests is one reasonable split. Discussion of the breakdown is 
welcome in this thread before any code lands.
   
   Spec: [Iceberg deletion 
vectors](https://iceberg.apache.org/spec/#deletion-vectors), [Puffin 
format](https://iceberg.apache.org/puffin-spec/), [RoaringBitmap 
serialization](https://github.com/RoaringBitmap/RoaringFormatSpec). 
Cross-client coverage: write a DV via iceberg-go, read back by Java/pyiceberg, 
assert filtered rows match.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to