I don’t see that we need [sequence numbers] for file/offset-deletes, since
they apply to a specific file. They’re not harmful, but the don’t seem
relevant.

These delete files will probably contain a path and an offset and could
contain deletes for multiple files. In that case, the sequence number can
be used to eliminate delete files that don’t need to be applied to a
particular data file, just like the column equality deletes. Likewise, it
can be used to drop the delete files when there are no data files with an
older sequence number.

I don’t understand the purpose of the min sequence number, nor what the
“min data seq” is.

Min sequence number would be used for pruning delete files without reading
all the manifests to find out if there are old data files. If no manifest
with data for a partition contains a file older than some sequence number
N, then any delete file with a sequence number < N can be removed.

The “min data seq” is the minimum sequence number of a data file. That
seems like what we actually want for the pruning I described above.

Off the top of my head [supporting non-key delete] requires adding
additional information to the manifest file, indicating the columns that
are used for the deletion. Only equality would be supported; if multiple
columns were used, they would be combined with boolean-and. I don’t see
anything too tricky about it.

Yes, exactly. I actually phrased it wrong initially. I think it would be
simple to extend the equality deletes to do this. We just need a way to
have global scope, not just partition scope.

If we add this on a per-deletion file basis it is not clear if there is any
relevance in preserving the concept of a unique row ID.

Agreed. That’s why I’ve been steering us away from the debate about whether
keys are unique or not. Either way, a natural key delete must delete all of
the records it matches.

I would assume that the maximum sequence number should appear in the table
metadata

Agreed.

[W]ould you make it optional to assign a sequence number to a snapshot?
“Replace” snapshots would not need one.

The only requirement is that it is monotonically increasing. If one isn’t
used, we don’t have to increment. I’d say it is up to the implementation to
decide. I would probably increment it every time to avoid errors.
-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to