dramaticlly opened a new pull request, #7581:
URL: https://github.com/apache/iceberg/pull/7581

   close #7560
   
   ## What
   add 2 column to partition metadata table
   - `last_updated_at` with timestamp with timezone type (similar to 
`committed_at` in snapshot metadata table)
   - `last_updated_snapshot_id` with long type  
   
   ## How
   In Partition metadata table, we moved from iterating over ContentFiles from 
ManifestReader to ManifestEntry instead, since entry contain both content file 
and snapshotId, which are valuable to provide last updated timestamp and 
snapshot information. We group by file for each partition boundary and get the 
highest snapshot commit timestamp so that we know when exactly is the partition 
last modified. 
   
   ## Why
   This can be potentially useful for iceberg table ingested by streaming 
application such as flink, to understand which iceberg partition is "sealed" 
based on timestamp. However, the write such as data compaction can also move 
the last update timestamp forward without actual data change, which snapshot id 
here can be useful for further analysis
   
   @szehon-ho if you can help review your idea :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to