rdblue commented on pull request #2354: URL: https://github.com/apache/iceberg/pull/2354#issuecomment-809718836
@openinx, I think that @jackye1995 is right about how the case you described would be encoded. The delete files themselves always encode what columns are used for the equality delete. There is no requirement that a delete file's delete columns match the table's row identifier fields. That's one reason why we can encode deletes right now, before we've added the row identifier tracking. That also enables deleting rows by different fields than the row identifier fields, which is what makes the evolution case possible. The row identifier fields are related to deletes only in that in situations where we don't have explicit delete columns in the operation, we can default the delete columns to the row identifier fields. That's to support the `UPSERT` case, where we define the identity fields in table metadata rather than in the sink configuration. From @jackye1995's second comment, I think there is at least some agreement that the row identifier columns don't need to be tracked over time. That's because there is no way to go back to an older snapshot and then manipulate that data. Time travel is read-only and data manipulation is always applied to the current snapshot, so it is reasonable that there is only ever one version of the row identifier that matters: the one that is configured at the start of the operation. Before moving ahead with this, I think we should simplify it and remove the versioning. I'm also wondering about the field ordering mentioned in the code. Is that relevant? I think of the row identifier fields as unordered and simply used to produce a projection of the table schema that is a row identifier, in whatever field order the schema had. So I would model this as an unordered set of IDs rather than as an ordered collection. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
