McKMarcBruchner opened a new issue, #14606: URL: https://github.com/apache/iceberg/issues/14606
### Query engine Spark 3.4.3 ### Question Hi Iceberg team, I was wondering how to best use the [rewrite_table_path](https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_table_path) procedure on a Backup. My situation is the following: - I have an S3 bucket on which Iceberg stores the data and metadata files - My metastore is being stored in a Hive metastore in a Postgres DB on RDS - I have a backup of that S3 bucket on another S3 bucket in another region, maybe even another account - I also have a backup of the RDS on the other account - Let's say my original S3 bucket got corrupted or I can't reach it anymore, so I need to switch to the backup bucket and backup RDS - Now I wanted to use `rewrite_table_path` and `register_table` to recreate the tables so that I can use them What I gather from the documentation: - the `rewrite_table_path` needs to have a registered table to work, because you are specifying the table name in the CALL command - on the other hand it says that only after I have run `rewrite_table_path`, I should run `register_table` with the new metadata.json. Which makes total sense to me. My problem is now, how can I run `rewrite_table_path` without registering the table first? In this case, Spark returns me a `Couldn't load table`, which makes sense, because the table does not exist. And in case I first register the table, Spark returns another error `Path s3a://backup-bucket/test_table/metadata/v1.metadata.json does not start with s3a://original-bucket/test_table/`. I understand how the `rewrite_table_path` would work if I can run this on my original bucket with the existing table, then move the data and metadata files to a new bucket and run `register_table` there. But that might not be possible for me if the old bucket got destroyed or corrupted or is otherwise unreachable. In [this blog](https://www.dremio.com/blog/disaster-recovery-for-apache-iceberg-tables-restoring-from-backup-and-getting-back-online/) they state that my approach should work, but I cannot execute `3. Check for File Path Changes Before Recovery` because of the problem described above. I feel that I'm missing something very obvious. Please advise! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
