pan3793 opened a new issue #2231: URL: https://github.com/apache/iceberg/issues/2231
Currently, we build ETL workflows based on Hive tables, and to achieve version control ability of data, we add a top level partition which named `ts` to all tables. And assign `ts` a specific value when trigger the workflow, then we can get the expected version of data of all tables with one `ts`. I know I can use `snapshot_id` which is auto generated to fetch the specific version of data in iceberg table, and it means if I want to identify the snapshots of all tables involved in workflow, I need to persist each table's `snapshot_id` when the table updated so I can use it later. Is there an approach to assign the snapshot a `snapshot_name` besides `snapshot_id`, so we can track the snapshots of relevant tables such as generated in same workflow in a convenient way? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
