I don’t see that we need [sequence numbers] for file/offset-deletes, since they apply to a specific file. They’re not harmful, but the don’t seem relevant.
These delete files will probably contain a path and an offset and could contain deletes for multiple files. In that case, the sequence number can be used to eliminate delete files that don’t need to be applied to a particular data file, just like the column equality deletes. Likewise, it can be used to drop the delete files when there are no data files with an older sequence number. I don’t understand the purpose of the min sequence number, nor what the “min data seq” is. Min sequence number would be used for pruning delete files without reading all the manifests to find out if there are old data files. If no manifest with data for a partition contains a file older than some sequence number N, then any delete file with a sequence number < N can be removed. The “min data seq” is the minimum sequence number of a data file. That seems like what we actually want for the pruning I described above. Off the top of my head [supporting non-key delete] requires adding additional information to the manifest file, indicating the columns that are used for the deletion. Only equality would be supported; if multiple columns were used, they would be combined with boolean-and. I don’t see anything too tricky about it. Yes, exactly. I actually phrased it wrong initially. I think it would be simple to extend the equality deletes to do this. We just need a way to have global scope, not just partition scope. If we add this on a per-deletion file basis it is not clear if there is any relevance in preserving the concept of a unique row ID. Agreed. That’s why I’ve been steering us away from the debate about whether keys are unique or not. Either way, a natural key delete must delete all of the records it matches. I would assume that the maximum sequence number should appear in the table metadata Agreed. [W]ould you make it optional to assign a sequence number to a snapshot? “Replace” snapshots would not need one. The only requirement is that it is monotonically increasing. If one isn’t used, we don’t have to increment. I’d say it is up to the implementation to decide. I would probably increment it every time to avoid errors. -- Ryan Blue Software Engineer Netflix