laskoviymishka opened a new issue, #997:
URL: https://github.com/apache/iceberg-go/issues/997
Parent: #589
Depends on #866 (DV bitmap reader landing first to settle the codec). v3
prefers DVs over Parquet position-delete files. Iceberg-go would be the first
non-Java client with a DV writer — pyiceberg writes Parquet position-deletes
today, and iceberg-rust hasn't shipped this either.
Shape:
```go
// New file: table/internal/dv_writer.go
type DVWriter struct {
fs io.IO
blobs []puffin.Blob
}
func (w *DVWriter) Add(dataFilePath string, positions []int64) error
func (w *DVWriter) Flush(ctx context.Context, location string) (DataFile,
error)
```
Serializes a Roaring 64-bit bitmap → Puffin blob
(`blob_type=deletion-vector-v1`) → manifest entry with `content=1,
file_format=PUFFIN, ReferencedDataFile, ContentOffset, ContentSizeInBytes`.
Commits through the existing `RowDelta.AddDeletes()` API so the rest of the
producer stack is unchanged.
Add a table property `write.delete.format=position|dv`. On v3 the default
flips to `dv`; v2 stays on `position`. When `dv`, the existing position-delete
writer in `table/internal/parquet_files.go` is bypassed in favor of `DVWriter`.
Scope is large enough that it should land across multiple PRs — roaring
serialization + writer skeleton, then producer wiring + property gating, then
cross-client tests is one reasonable split. Discussion of the breakdown is
welcome in this thread before any code lands.
Spec: [Iceberg deletion
vectors](https://iceberg.apache.org/spec/#deletion-vectors), [Puffin
format](https://iceberg.apache.org/puffin-spec/), [RoaringBitmap
serialization](https://github.com/RoaringBitmap/RoaringFormatSpec).
Cross-client coverage: write a DV via iceberg-go, read back by Java/pyiceberg,
assert filtered rows match.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]