You go to sleep for one night and they change everything ... :) I've just caught up with the new backup system proposal. This changes the schema migrations design a bit - I'll update the doc.
On 12 June 2014 21:33, John Meinel <[email protected]> wrote: > If I read the conversations on IRC, they were talking about changing the > backup to be just a POST to an HTTP endpoint, and you get back the contents > of the DB, which would be deleted when the backup completes. Though you > could still probably use whatever internal helpers spool the data to a temp > location to do the same for backup & restore. > > > On Thu, Jun 12, 2014 at 8:40 AM, Menno Smits <[email protected]> > wrote: > >> I've updated the schema migration document with the ideas that have come >> up in recent discussions.The scope of the schema migrations work has been >> reduced somewhat by making the upgrade step Apply/Rollback concept a >> separate project (database changes can be rolled back through the use of >> mongobackup/restore). >> >> I've raised a few issues in the comments about handling various failure >> modes. Input would be greatly appreciated. >> >> Nate: it would be good for you to have a look at this because we're >> planning on leaning on the new backup functionality quite a bit. Let me >> know if anything I'm proposing isn't compatible with what your team is >> working on. >> >> >> https://docs.google.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing >> >> On 6 June 2014 13:18, Menno Smits <[email protected]> wrote: >> >>> After some fruitful discussions, Tim and I have come up with something >>> that I think is starting to look pretty good. There's a significant change >>> to how we handle backups and rollbacks that seems like the right direction. >>> I've tried to capture it all in a Google Doc as this email thread is >>> starting to get impractical. Feel free to add comments and edit. >>> >>> >>> https://docs.google.com/a/canonical.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing >>> >>> >>> On 3 June 2014 13:34, Menno Smits <[email protected]> wrote: >>> >>>> On 30 May 2014 01:47, John Meinel <[email protected]> wrote: >>>> >>>>> >>>>> >>>>>> Building on John's thoughts, and adding Tim's and mine, here's what >>>>>> I've got so far:: >>>>>> >>>>>> - Introduce a "database-version" key into the EnvironConfig document >>>>>> which tracks the Juju version that the database schema matches. More on >>>>>> this later. >>>>>> >>>>> >>>>> For clarity, I would probably avoid putting this key into >>>>> EnvironConfig, but instead have it in a separate document. That also makes >>>>> it easy to watch for just this value changing. >>>>> >>>> >>>> SGTM. I've got no strong opinion on this. >>>> >>>> >>>>> >>>>> Potentially, I would decouple the value in this key from the actual >>>>> agent versions. Otherwise you do null DB schema upgrades on every minor >>>>> release. Maybe that's sane, but it *feels* like they are too separate >>>>> issues. (what is the version of the DB schema is orthogonal to what >>>>> version >>>>> of the code I'm running.) It may be that the clarity and simplification of >>>>> just one version wins out. >>>>> >>>> >>>> I think it makes sense to just use the Juju version for the DB schema >>>> version. When you think about it, the DB schema is actually quite tightly >>>> coupled to the code version so why introduce another set of numbers to >>>> track? I'm thinking that if there's no schema upgrade steps required for a >>>> software given version then the DB is left alone except that the schema >>>> version number gets bumped. >>>> >>>> >>>>> - Introduce a MasterStateServer upgrade target which marks upgrade >>>>>> steps which are only to run on the master state server. Also more below. >>>>>> >>>>> >>>>> This is just a compiled-in list of steps to apply, right? >>>>> >>>> >>>> Yes. I was thinking that schema upgrade steps would be defined in the >>>> same place and way that other upgrade steps are currently defined so that >>>> they could even be interleaved with other kinds of upgrade steps. >>>> >>>> What I'm proposing here is that where we currently have 2 types of >>>> upgrade targets - AllMachines and StateServer - we introduce a third target >>>> called MasterStateServer which would be primarily (exclusively?) used for >>>> schema migration steps. >>>> >>>> >>>>>> - Non-master JobManageEnviron machine agents run their upgrade steps >>>>>> as usual and then watch for EnvironConfig changes. They don't consider >>>>>> the >>>>>> upgrade to be complete (and therefore let their other workers start) >>>>>> until >>>>>> database-version matches agent-version. This prevents the new version of >>>>>> the state server agents from running before the schema migrations for the >>>>>> new software version have run. >>>>>> >>>>> >>>>> I'm not sure if schema should be done before or after other upgrade >>>>> steps. Given we're really stopping the world here, it might be prudent to >>>>> just wait to do your upgrade steps until you know that the DB upgrade has >>>>> been done. >>>>> >>>> >>>> As mentioned above, with what I'm thinking there is no real distinction >>>> between schema migration steps and other types of upgrade steps so there's >>>> no concept of schema migrations happening before or after other upgrade >>>> steps. >>>> >>>> *Observations/Questions/Issues* >>>>> >>>>>> >>>>>> - There are a lot of moving parts here. What could be made simpler? >>>>>> >>>>>> - What do we do if the master mongo database or host fails during the >>>>>> upgrade? Is it a goal for one of the other state servers take over and >>>>>> run >>>>>> the schema upgrades itself and let the upgrade finish? If so, is this a >>>>>> must-have up-front requirement or a nice-to-have? >>>>>> >>>>> >>>>> Some thoughts: >>>>> >>>> >>>> >>>>> 1. If the actual master mongo DB fails, that will cause reelection, >>>>> which should cause all of the servers to get their connections to Mongo >>>>> bounced, and then they'll notice that there is a new master who is >>>>> responsible for applying the database changes. >>>>> >>>> >>>> We will have to do some testing to ensure that this scenario actually >>>> works. Maybe I'm over thinking it, but my gut says there's there's plenty >>>> to go wrong here. >>>> >>>> 2. If it is just the master Juju process that fails, I don't think >>>>> there is any great expectation that a different process running the same >>>>> code is going to succeed, is there? >>>>> >>>> >>>> Agreed. >>>> >>>> >>>>> 3. There is also a fair possibility that the schema migration we've >>>>> written won't work with real data in the wild. (we assumed this field was >>>>> never written, but suddenly it is, etc). We've talked about the ability to >>>>> have Upgrade roll back, and maybe we could consider that here. Some >>>>> possible steps are: >>>>> >>>>> >>>>> 1. Copy the db to another location >>>>> 2. Try to apply the schema updates (either in place or only to the >>>>> backup) >>>>> 3. If upgrade fails, roll back to the old version, and update the >>>>> AgentVersion in environ config so that the other agents will try to >>>>> "upgrade" themselves back to the old version. This would also be a >>>>> reason >>>>> to do the DB schema before actually applying any other upgrade steps. >>>>> We >>>>> probably want some sort of "could not upgrade because of" tracking >>>>> here, so >>>>> that it can be reported to the user >>>>> >>>>> >>>> I like this and it should work as long as there's enough storage >>>> available to make a copy of the database. I'm not exactly clear on how we >>>> would revert to the backup instance if the migration fails but I'm sure >>>> this can be made to work. It might be enough for the first iteration if we >>>> initially make some kind of backup that the user has access to that they >>>> can restore from manually. >>>> >>>> As you mention, this would benefit from the DB schema steps being >>>> separate from the other upgrade steps. I have no real issue with this other >>>> than having them separate will probably mean more change to the existing >>>> upgrades package. This voids some of the things I've said earlier in this >>>> email :-) I'll think some more about how this could look. >>>> >>>> 4. As long as we do some sort of "backup before applying the change" we >>>>> allow users a way to recover the system if something failed. If we have >>>>> proper Backup support integrated into core, one option is that we just >>>>> trigger a backup and then upgrade in place, if stuff breaks, we at least >>>>> have *something* that should be recoverable. >>>>> >>>> >>>> It's a pity that the full Backup feature isn't there yet as this could >>>> be a nice way to get a first version of schema migrations working quickly. >>>> >>>>> >>>>> >>>>> >>>>>> - Upgrade steps currently have access to State but I think this >>>>>> probably won't be sufficient to perform many types of schema migrations >>>>>> (i.e. accessing defunct fields, removing fields, adding indexes etc). Do >>>>>> we >>>>>> want to extend State to provide a number of schema migration helpers or >>>>>> do >>>>>> we expose mongo connections directly to the upgrade steps? >>>>>> >>>>> >>>>> I believe the existing Upgrade logic actually has access to the API >>>>> not to State itself, so we'll need something there. The State object has >>>>> raw mongo collections on it (environs, charms, etc). >>>>> >>>> >>>> The existing upgrade logic has access to both the API and State (the >>>> latter only on state machines obviously, that arg is nil otherwise) so >>>> that's already done. >>>> >>>> >>>>> DB Schema (IMO) inherently is going to be at the raw DB level, vs >>>>> changes in the abstract objects. (I expect that it will be defined in >>>>> terms >>>>> of Apply this function to all entities in this collection, rather than >>>>> iterate over Machine objects and set data on them.) >>>>> I could be wrong, but it does seem like we'll want the syntax of db >>>>> schema changes to be on mgo.Collection objects, and not on State objects. >>>>> >>>> >>>> I completely agree that we need schema migrations to work in the >>>> mongodb world and not via application level objects. Some schema migration >>>> tasks just won't make sense at the application object level. >>>> >>>> State doesn't expose its mgo collections to the outside though so how >>>> would a schema migration step interact with them, especially for tasks such >>>> as adding new collections or indexes? Do we add a bunch of schema migration >>>> helper methods on to State (e.g. AddCollection(), AddIndex(), >>>> ApplyToCollection() etc) or do we add a single method which exposes the >>>> mongo database object (clearly marked as exclusively there for use by >>>> schema upgrade steps), or do we have schema migration steps pass a function >>>> that takes a mongo DB object to act on? We already expose the mongo session >>>> with MongoSession() so there is some precedent for this. >>>> >>>> >>>>> >>>>>> - There is a possibility that a non-master state server won't >>>>>> upgrade, blocking the master from completing the upgrade. Should there >>>>>> be a >>>>>> timeout before the master gives up on state servers upgrading themselves >>>>>> and performs its own upgrade steps anyway? >>>>>> >>>>> >>>>> Arguably this is a better case for "rollback" than "just move forward". >>>>> >>>> >>>> Ok - sounds good. >>>> >>>> >>>>> >>>>> >>>>>> >>>>>> - Given the order of documents a juju system stores, it's likely that >>>>>> the schema migration steps will be quite quick, even for a large >>>>>> installation. >>>>>> >>>>>> >>>>> "order of magnitude" right? >>>>> >>>> >>>> Yes - sorry that wasn't very clear. >>>> >>>> >>>>> Yeah, we're talking megabytes, GB being really large, not many GB of >>>>> data. >>>>> >>>> >>>> Great. >>>> >>>> Thanks for the excellent feedback. >>>> >>>> - Menno >>>> >>>> >>> >> >
-- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
