dramaticlly opened a new pull request, #7581: URL: https://github.com/apache/iceberg/pull/7581
close #7560 ## What add 2 column to partition metadata table - `last_updated_at` with timestamp with timezone type (similar to `committed_at` in snapshot metadata table) - `last_updated_snapshot_id` with long type ## How In Partition metadata table, we moved from iterating over ContentFiles from ManifestReader to ManifestEntry instead, since entry contain both content file and snapshotId, which are valuable to provide last updated timestamp and snapshot information. We group by file for each partition boundary and get the highest snapshot commit timestamp so that we know when exactly is the partition last modified. ## Why This can be potentially useful for iceberg table ingested by streaming application such as flink, to understand which iceberg partition is "sealed" based on timestamp. However, the write such as data compaction can also move the last update timestamp forward without actual data change, which snapshot id here can be useful for further analysis @szehon-ho if you can help review your idea :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
