Replies inline. On Mon, May 6, 2019 at 3:01 PM Anton Okolnychyi <aokolnyc...@apple.com> wrote:
> I am also wondering whether it makes sense to have a config that limits > the number of snapshot we want to track. This config can be based on the > number of snapshots (e.g. keep only 10000 snapshots) or based on time (e.g. > keep snapshots for the last 7 days). We can implement both, actually. > AFAIK, the expiration of snapshots is manual right now. Would it make sense > to control this via config options or do we expect that users do this? > I'm reluctant to do this without an explicit call from the user or in a service. The problem is when to expire snapshots. Iceberg is called regularly to read and write tables. That might seem like a good time to expire snapshots, but it doesn't make sense for either one to have a side effect of physically deleting data files and discarding metadata. That's going beyond user expectations to do destructive tasks. Plus, it changes the guarantees of those operations, where reads should be as fast as possible and there may be guarantees relying on writes not doing additional operations that could cause failures. > Spark provides queryId and epochId/batchId to all sinks, which must ensure > that all writes are idempotent. Spark might try to commit the same batch > multiple times. So, we need to know the latest committed batchId for every > query. One option is to store this information in the table metadata. > However, this breaks time traveling and rollbacks. We need to have this > mapping per snapshot. Snapshot summary seems like a reasonable choice. > Would it make sense to do smth similar to “total-records” and “total-files” > to keep the latest committed batch id for each query? Any other ideas are > welcome. > For Flink, we're creating a UUID for each checkpoint that writes files, writing that into the snapshot summary, and then checking whether a known snapshot had that ID when the write resumes after a failure. That sounds like what you're suggesting here, but using queryId/epochId as the write ID. Sounds like a good plan to me. rb -- Ryan Blue Software Engineer Netflix