qphien commented on issue #2093:
URL: https://github.com/apache/iceberg/issues/2093#issuecomment-762191244


   > You might be able to do this via the various metadata tables, though it 
would be somewhat complex. https://iceberg.apache.org/spark/#inspecting-tables
   > 
   > It looks like you could achieve this by joining a table's `manifest` 
metadata table with the table's , which has a `partitions` column indicating 
what partition columns have been affected, with the table's `snapshots` table 
and `history` metadata table.
   > 
   > There are some examples of joining the two, but essentially you'd want to 
explode the table's snapshot metadata table on the `manifest_list` column so 
that you get one row in the expanded snapshots table for each updated / created 
manifest. That manifest path can be joined with the `path` column in the 
`manifest` metadata table to then get all of the partitions that are involved 
in that snapshot. You can find when exactly that snapshot was made current by 
joining on the `made_current_at` field from the metadata `history` table.
   
   Thanks @kbendick for your reply. Yeah, we can join `manifest` with 
`snapshot` and `history` to get partition create/update time, but this join 
query is inefficient when there are large number of snapshots, we have to scan 
all snapshots and manifests.
   
   Could we add an additional `create-time` field to `manifest.data_file`? In 
this case, only latest snapshot and related manifests are needed to scan.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to