Hi Folks, I've been on holiday (and will be again next week) but I've started taking some steps internally to dedicate some engineering time to this project. Around the last week of August I expect to be able to dedicate some meaningful time each week to this.
On Thu, Jul 18, 2019 at 3:47 PM Ryan Blue <rb...@netflix.com.invalid> wrote: > Hi everyone, sorry it took a while for me to get these notes sent out. > Please reply with discussion or corrections. > > *Attendees*: > > Ryan Blue > Anjali Norwood > Jacques Nadeau > Anton Okolnychyi > David Muto > Erik Wright > Owen O’Malley > > *Topics*: > > - Quick summary of the current approach with sequence numbers > - Should global delete be supported? > - What is the scope of deletes within a snapshot? > - Should synthetic delete files use sequence numbers? > - What should be used as a record identifier? Does offset work? > - What is the format of a delete diff? > - What is the scope of a delete diff in a table? > - How will per-file delete diffs work? > - Next steps > > *DIscussion*: > > - Quick summary: we agree that Iceberg will add sequence numbers to > metadata to scope deletes across snapshots (time). Deletes will apply to > all all files with an earlier sequence number. Iceberg will use two > formats, a synthetic delete diff using file/offset and an equality delete > diff using a set of values to match to row data. > - Global deletes > - Ryan: this is for GDPR. Data layout is for query performance, not > delete performance. Deleting a records that could be anywhere should be > possible without eagerly scanning all data files in tables that are > tens of > petabytes > - Owen: customers have this use case as well, it should be supported > - Erik (I think): these are slow to apply because they probably are > not sorted > - Jacques: hash-set deletes are not a format constraint, it is an > engine constraint > - Ryan: we can’t always depend on sorting. That is an optimization, > but engines may need to use a hash-set > - Owen (I think): table maintenance and delete compaction is > important to keep merge costs low > - Scope of deletes within a snapshot: > - Erik suggested using all the same metadata as data files > - Partition data will be used to scope deletes within a snapshot. > - Sequence numbers and synthetic delete files: > - Anton: will sequence numbers be used? > - Ryan: Yes. More ways to eliminate delete diffs that do not need > to be applied is good for performance. Simpler to always apply the same > rules, too. > - Also, reusing files (or un-deleting files) could be a correctness > problem. > - Synthetic deletes and offsets: What should be used as a record > identifier > - The confusion was that “offset” could be interpreted as byte > offset in a file. The intent was row position. Will use “position” from > now > on. > - What is the format of a delete diff? > - Equality deletes: the data columns to match. For example, a, b > for a = ? and b = ? with ? filled in by row data > - Positional deletes: file name and row position (sparse format, > multiple data files covered by in a single delete file) > - Jacques (I think): How would a positional delete file apply to > just one data file? > - Erik: Delete files should also have column lower/upper bounds for > the deleted rows. This can be copied and merged for data files to also > use > stats to eliminate deletes that do not need to be applied > - Erik: The latest write-up uses all existing data file metadata > fields, unchanged > - Ryan: that’s a clever idea and would help performance > - It wouldn’t be possible to add a filename field to lower/upper > bounds, so this wouldn’t work for scoping to a single file > - Should a single file name, list of files names, or bloom filter > be added to identify data files? Maybe as a future optimization > - How will scope work for global deletes? > - Ryan: use a manifest with a different partition spec. The > unpartitioned spec for global delete files. > - Erik: that could be used to apply deletes to other levels as > well. For example, a table partitioned with bucketing could encode > deletes > with a partition spec that doesn’t include the bucketing level to apply > to > all buckets. > - Ryan: that is difficult because Iceberg would need to decide that > one partition contains another to apply diffs across partitions. This > complicates scope, so maybe we should only allow global and > partition-level > for now. > - Next steps: > - Add preconditions for table format version (done!) > - Add sequence numbers to metadata and update the spec (compatible > with v1) > - Define and document the delete diff formats (Anton and Erik) > - Update readers to apply deletes > > -- > Ryan Blue > Software Engineer > Netflix >