Hi Aurorans, As you may have seen (https://issues.apache.org/jira/browse/AURORA-1648), we ran into an issue when upgrading a cluster that uses dbScripts in snapshots. To restate the problem from the ticket, when the scheduler starts up it creates the H2 database from schema.sql which contains newly added (or changed) tables. If the dbScript exists in the snapshot, it then drops all tables and runs the script which re-creates the database in the form it was previously in and repopulates all the data. At this point, if a table was added that was not in the snapshot, that table is lost.
The solution here is to be run a migration after restoring the database, my question is what level of complexity do we want for migrations. I see two possible options: 1. The relatively simple approach where we have a single migration.sql alongside schema.sql which contains any changes to run against the database. After restoring the db from a dbScript in the snapshot, we would run this script. On release boundaries(?), we clear the script out. The main caveat here is that the ddl statements in this script must be idempotent since they would potentially be run multiple times. Another issue is this would make it more complicated to jump over versions (upgrading from v0.12.0 to v0.14.0 means you might miss a migration that only existed in the 0.13.0 release). 2. A slightly more involved approach would be to track the schema version in the database itself. Instead of having a single migration.sql script that contains all migrations, we could instead have migrations/001.sql, migrations/002.sql, etc. We'd add a new table to the database to track which migrations have been run. After restoring via dbScript, we'd check the table and run any migrations necessary (or all of the table does not exist). Thoughts on these options? Anyone have any alternatives? Cheers, Joshua