laskoviymishka opened a new issue, #1135: URL: https://github.com/apache/iceberg-go/issues/1135
Today iceberg-go writes deletion vectors for v3 unpartitioned tables (#1113), but v3 partitioned tables still fall through to the legacy Parquet position-delete writer with a deprecation warning. Spec mandates DV on v3 regardless of partitioning, and Java's `BaseDVFileWriter` has no such split. Real-world example: time-series tables partitioned by month or day, where corrections and backfills routinely span partition boundaries. A single delete commit today emits one Parquet position-delete file per affected partition rather than a single Puffin file with one DV blob per affected data file — storage amplifies (Parquet rows of `(file_path, pos)` tuples are much larger than Roaring-compressed bitmaps), reads amplify (one Parquet file open per partition vs O(1) blob seeks per data file), and the deprecation warning fires on every delete commit until this lands. Java's `BaseDVFileWriter` is partition-agnostic at the Puffin level: one Puffin file globally per flush, one blob per data file, partition data propagated to each output DV manifest entry via `withPartition(deletes.partition())`. The "partitioned vs unpartitioned DV" split iceberg-go has today is an implementation choice rooted in `DVWriter.Add` not yet accepting partition metadata, not a spec distinction. Spec: https://iceberg.apache.org/spec/#deletion-vectors Java reference: `BaseDVFileWriter` in apache/iceberg Parent: #589. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
