McKMarcBruchner opened a new issue, #14606:
URL: https://github.com/apache/iceberg/issues/14606

   ### Query engine
   
   Spark 3.4.3
   
   ### Question
   
   Hi Iceberg team,
   
   I was wondering how to best use the 
[rewrite_table_path](https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_table_path)
 procedure on a Backup.
   
   My situation is the following:
   
   - I have an S3 bucket on which Iceberg stores the data and metadata files
   - My metastore is being stored in a Hive metastore in a Postgres DB on RDS
   - I have a backup of that S3 bucket on another S3 bucket in another region, 
maybe even another account
   - I also have a backup of the RDS on the other account
   - Let's say my original S3 bucket got corrupted or I can't reach it anymore, 
so I need to switch to the backup bucket and backup RDS
   - Now I wanted to use `rewrite_table_path` and `register_table` to recreate 
the tables so that I can use them
   
   What I gather from the documentation:
   
   - the `rewrite_table_path` needs to have a registered table to work, because 
you are specifying the table name in the CALL command
   - on the other hand it says that only after I have run `rewrite_table_path`, 
I should run `register_table` with the new metadata.json. Which makes total 
sense to me.
   
   My problem is now, how can I run `rewrite_table_path` without registering 
the table first? In this case, Spark returns me a `Couldn't load table`, which 
makes sense, because the table does not exist.
   
   And in case I first register the table, Spark returns another error `Path 
s3a://backup-bucket/test_table/metadata/v1.metadata.json does not start with 
s3a://original-bucket/test_table/`.
   
   I understand how the `rewrite_table_path` would work if I can run this on my 
original bucket with the existing table, then move the data and metadata files 
to a new bucket and run `register_table` there. But that might not be possible 
for me if the old bucket got destroyed or corrupted or is otherwise unreachable.
   
   In [this 
blog](https://www.dremio.com/blog/disaster-recovery-for-apache-iceberg-tables-restoring-from-backup-and-getting-back-online/)
 they state that my approach should work, but I cannot execute `3. Check for 
File Path Changes Before Recovery` because of the problem described above.
   
   I feel that I'm missing something very obvious. Please advise!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to