Hi Iceberg/Impala Team, We've been working on adding read support for Iceberg V2 tables in Impala. In the first round we're focusing on position deletes.
We are thinking about different approaches so I've written a design doc about it: https://docs.google.com/document/d/1WF_UOanQ61RUuQlM4LaiRWI0YXpPKZ2VEJ8gyJdDyoY/ TL;DR: The Scan Planning <https://iceberg.apache.org/spec/#scan-planning> of the Iceberg spec says: A position delete file must be applied to a data file when all of the following are true: - The data file’s sequence number is less than or equal to the delete file’s sequence number - ... Basically we would like to do an ANTI JOIN between data files and delete files. We have some troubles with sequence numbers though, as these are not exposed by the Iceberg API. Does Iceberg allow deleting a data file, then adding a new one with the same name? Probably no, as it would cause all kinds of problems, e.g. time travel issues, and I can see that Iceberg generates unique file names. So if the answer is no, then we probably don't even need the sequence number during query execution. This and other interesting challenges/questions are in the doc, hope you guys enjoy reading it! Cheers, Zoltan Borok-Nagy