Row-level delete sync notes - July 2019

Ryan Blue Thu, 18 Jul 2019 12:47:12 -0700

Hi everyone, sorry it took a while for me to get these notes sent out.
Please reply with discussion or corrections.


*Attendees*:

Ryan Blue
Anjali Norwood
Jacques Nadeau
Anton Okolnychyi
David Muto
Erik Wright
Owen O’Malley

*Topics*:

   - Quick summary of the current approach with sequence numbers
   - Should global delete be supported?
   - What is the scope of deletes within a snapshot?
   - Should synthetic delete files use sequence numbers?
   - What should be used as a record identifier? Does offset work?
   - What is the format of a delete diff?
   - What is the scope of a delete diff in a table?
   - How will per-file delete diffs work?
   - Next steps

*DIscussion*:

   - Quick summary: we agree that Iceberg will add sequence numbers to
   metadata to scope deletes across snapshots (time). Deletes will apply to
   all all files with an earlier sequence number. Iceberg will use two
   formats, a synthetic delete diff using file/offset and an equality delete
   diff using a set of values to match to row data.
   - Global deletes
      - Ryan: this is for GDPR. Data layout is for query performance, not
      delete performance. Deleting a records that could be anywhere should be
      possible without eagerly scanning all data files in tables that
are tens of
      petabytes
      - Owen: customers have this use case as well, it should be supported
      - Erik (I think): these are slow to apply because they probably are
      not sorted
      - Jacques: hash-set deletes are not a format constraint, it is an
      engine constraint
      - Ryan: we can’t always depend on sorting. That is an optimization,
      but engines may need to use a hash-set
      - Owen (I think): table maintenance and delete compaction is
      important to keep merge costs low
   - Scope of deletes within a snapshot:
      - Erik suggested using all the same metadata as data files
      - Partition data will be used to scope deletes within a snapshot.
   - Sequence numbers and synthetic delete files:
      - Anton: will sequence numbers be used?
      - Ryan: Yes. More ways to eliminate delete diffs that do not need to
      be applied is good for performance. Simpler to always apply the
same rules,
      too.
      - Also, reusing files (or un-deleting files) could be a correctness
      problem.
   - Synthetic deletes and offsets: What should be used as a record
   identifier
      - The confusion was that “offset” could be interpreted as byte offset
      in a file. The intent was row position. Will use “position” from now on.
   - What is the format of a delete diff?
      - Equality deletes: the data columns to match. For example, a, b for a
      = ? and b = ? with ? filled in by row data
      - Positional deletes: file name and row position (sparse format,
      multiple data files covered by in a single delete file)
      - Jacques (I think): How would a positional delete file apply to just
      one data file?
      - Erik: Delete files should also have column lower/upper bounds for
      the deleted rows. This can be copied and merged for data files
to also use
      stats to eliminate deletes that do not need to be applied
      - Erik: The latest write-up uses all existing data file metadata
      fields, unchanged
      - Ryan: that’s a clever idea and would help performance
      - It wouldn’t be possible to add a filename field to lower/upper
      bounds, so this wouldn’t work for scoping to a single file
      - Should a single file name, list of files names, or bloom filter be
      added to identify data files? Maybe as a future optimization
   - How will scope work for global deletes?
      - Ryan: use a manifest with a different partition spec. The
      unpartitioned spec for global delete files.
      - Erik: that could be used to apply deletes to other levels as well.
      For example, a table partitioned with bucketing could encode
deletes with a
      partition spec that doesn’t include the bucketing level to apply to all
      buckets.
      - Ryan: that is difficult because Iceberg would need to decide that
      one partition contains another to apply diffs across partitions. This
      complicates scope, so maybe we should only allow global and
partition-level
      for now.
   - Next steps:
      - Add preconditions for table format version (done!)
      - Add sequence numbers to metadata and update the spec (compatible
      with v1)
      - Define and document the delete diff formats (Anton and Erik)
      - Update readers to apply deletes

-- 
Ryan Blue
Software Engineer
Netflix

Row-level delete sync notes - July 2019

Reply via email to