Hi everyone, sorry it took a while for me to get these notes sent out. Please reply with discussion or corrections.
*Attendees*: Ryan Blue Anjali Norwood Jacques Nadeau Anton Okolnychyi David Muto Erik Wright Owen O’Malley *Topics*: - Quick summary of the current approach with sequence numbers - Should global delete be supported? - What is the scope of deletes within a snapshot? - Should synthetic delete files use sequence numbers? - What should be used as a record identifier? Does offset work? - What is the format of a delete diff? - What is the scope of a delete diff in a table? - How will per-file delete diffs work? - Next steps *DIscussion*: - Quick summary: we agree that Iceberg will add sequence numbers to metadata to scope deletes across snapshots (time). Deletes will apply to all all files with an earlier sequence number. Iceberg will use two formats, a synthetic delete diff using file/offset and an equality delete diff using a set of values to match to row data. - Global deletes - Ryan: this is for GDPR. Data layout is for query performance, not delete performance. Deleting a records that could be anywhere should be possible without eagerly scanning all data files in tables that are tens of petabytes - Owen: customers have this use case as well, it should be supported - Erik (I think): these are slow to apply because they probably are not sorted - Jacques: hash-set deletes are not a format constraint, it is an engine constraint - Ryan: we can’t always depend on sorting. That is an optimization, but engines may need to use a hash-set - Owen (I think): table maintenance and delete compaction is important to keep merge costs low - Scope of deletes within a snapshot: - Erik suggested using all the same metadata as data files - Partition data will be used to scope deletes within a snapshot. - Sequence numbers and synthetic delete files: - Anton: will sequence numbers be used? - Ryan: Yes. More ways to eliminate delete diffs that do not need to be applied is good for performance. Simpler to always apply the same rules, too. - Also, reusing files (or un-deleting files) could be a correctness problem. - Synthetic deletes and offsets: What should be used as a record identifier - The confusion was that “offset” could be interpreted as byte offset in a file. The intent was row position. Will use “position” from now on. - What is the format of a delete diff? - Equality deletes: the data columns to match. For example, a, b for a = ? and b = ? with ? filled in by row data - Positional deletes: file name and row position (sparse format, multiple data files covered by in a single delete file) - Jacques (I think): How would a positional delete file apply to just one data file? - Erik: Delete files should also have column lower/upper bounds for the deleted rows. This can be copied and merged for data files to also use stats to eliminate deletes that do not need to be applied - Erik: The latest write-up uses all existing data file metadata fields, unchanged - Ryan: that’s a clever idea and would help performance - It wouldn’t be possible to add a filename field to lower/upper bounds, so this wouldn’t work for scoping to a single file - Should a single file name, list of files names, or bloom filter be added to identify data files? Maybe as a future optimization - How will scope work for global deletes? - Ryan: use a manifest with a different partition spec. The unpartitioned spec for global delete files. - Erik: that could be used to apply deletes to other levels as well. For example, a table partitioned with bucketing could encode deletes with a partition spec that doesn’t include the bucketing level to apply to all buckets. - Ryan: that is difficult because Iceberg would need to decide that one partition contains another to apply diffs across partitions. This complicates scope, so maybe we should only allow global and partition-level for now. - Next steps: - Add preconditions for table format version (done!) - Add sequence numbers to metadata and update the spec (compatible with v1) - Define and document the delete diff formats (Anton and Erik) - Update readers to apply deletes -- Ryan Blue Software Engineer Netflix