> I think it might be worth mentioning the current proposal makes some,
mostly minor, design choices to try to be compatible with Delta Lake
deletion vectors.

Yes it does, and thanks for pointing this out, Micah. I think it's
important to consider whether compatibility is important to this community.
I just replied to Piotr on the PR, but I'll adapt some of that response
here to reach the broader community.

I think there is value in supporting compatibility with older Delta
readers, but I acknowledge that this may be my perspective because my
employer has a lot of Delta customers that we are going to support now and
in the future.

The main use case for maintaining compatibility with the Delta format is
that it's hard to move old jobs to new code in a migration. We see a
similar issue in Hive to Iceberg migrations, where unknown older readers
prevent migration entirely because they are hard to track down and often
read files directly from the backing object store. I'd like to avoid the
same problem here, where all readers need to be identified and migrated at
the same time. Compatibility with the format those readers expect makes it
possible to maintain Delta metadata for them temporarily. That increases
confidence that things won't randomly break and makes it easier to get
people to move forward.

The second reason for maintaining compatibility is that we want for the
formats to become more similar. My hope is that we can work across both
communities and come up with a common metadata format in a future version
-- which explains my interest in smooth migrations. Maintaining
compatibility in cases like this builds trust and keeps our options open:
if we have compatible data layers, then it's easier to build a compatible
metadata layer. I'm hoping that if we make the blob format compatible, we
can get the Delta community to start using Puffin for better
self-describing delete files.

Other people may not share those goals, so I think it helps to consider
what is being compromised for this compatibility. I don't think it is too
much. There are 2 additional fields:
* A 4-byte length field (that Iceberg doesn't need)
* A 4-byte CRC to validate the contents of the bitmap

There are also changes to how these would have been added if the Iceberg
community were building this independently.
* Our initial version didn't include a CRC at all, but now that we think
it's useful compatibility means using a CRC-32 checksum rather than a newer
one
* The Delta format uses big endian for its fields (or mixed endian if you
consider RoaringBitmap is LE)
* The magic bytes (added to avoid reading the Puffin footer) would have
been different

Overall, I don't think that those changes to what we would have done are
unreasonable. It's only 8 extra bytes and half of them are for a checksum
that is a good idea.

I'm looking forward to what the rest of the community thinks about this.
Thanks for reviewing the PR!

Ryan


On Sun, Oct 13, 2024 at 10:45 PM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi
>
> Thanks for the PRs ! I reviewed Anton's document, I will do a pass on the
> PRs.
>
> Imho, it's important to get feedback from query engines, as, if delete
> vectors is not a problem per se (it's what we are using as internal
> representation), the use of Puffin files to store it is "impactful"
> for the query engines (probably some query engines might need to
> implement Puffin spec (read/write) using other language than Java, for
> instance Apache Impala).
>
> I like the proposal, I just hope we won't "surprise" some query
> engines with extra work :)
>
> Regards
> JB
>
> On Thu, Oct 10, 2024 at 11:41 PM rdb...@gmail.com <rdb...@gmail.com>
> wrote:
> >
> > Hi everyone,
> >
> > There seems to be broad agreement around Anton's proposal to use
> deletion vectors in Iceberg v3, so I've opened two PRs that update the spec
> with the proposed changes. The first, PR #11238, adds a new Puffin blob
> type, delete-vector-v1, that stores a delete vector. The second, PR #11240,
> updates the Iceberg table spec.
> >
> > Please take a look and comment!
> >
> > Ryan
>

Reply via email to