After some fruitful discussions, Tim and I have come up with something that I think is starting to look pretty good. There's a significant change to how we handle backups and rollbacks that seems like the right direction. I've tried to capture it all in a Google Doc as this email thread is starting to get impractical. Feel free to add comments and edit.
https://docs.google.com/a/canonical.com/document/d/1pBxGEGTmGa1Y61YJ3KZ7vwOP-7Gumt4Czr_spINHHXM/edit?usp=sharing On 3 June 2014 13:34, Menno Smits <[email protected]> wrote: > On 30 May 2014 01:47, John Meinel <[email protected]> wrote: > >> >> >>> Building on John's thoughts, and adding Tim's and mine, here's what I've >>> got so far:: >>> >>> - Introduce a "database-version" key into the EnvironConfig document >>> which tracks the Juju version that the database schema matches. More on >>> this later. >>> >> >> For clarity, I would probably avoid putting this key into EnvironConfig, >> but instead have it in a separate document. That also makes it easy to >> watch for just this value changing. >> > > SGTM. I've got no strong opinion on this. > > >> >> Potentially, I would decouple the value in this key from the actual agent >> versions. Otherwise you do null DB schema upgrades on every minor release. >> Maybe that's sane, but it *feels* like they are too separate issues. (what >> is the version of the DB schema is orthogonal to what version of the code >> I'm running.) It may be that the clarity and simplification of just one >> version wins out. >> > > I think it makes sense to just use the Juju version for the DB schema > version. When you think about it, the DB schema is actually quite tightly > coupled to the code version so why introduce another set of numbers to > track? I'm thinking that if there's no schema upgrade steps required for a > software given version then the DB is left alone except that the schema > version number gets bumped. > > >> - Introduce a MasterStateServer upgrade target which marks upgrade steps >>> which are only to run on the master state server. Also more below. >>> >> >> This is just a compiled-in list of steps to apply, right? >> > > Yes. I was thinking that schema upgrade steps would be defined in the same > place and way that other upgrade steps are currently defined so that they > could even be interleaved with other kinds of upgrade steps. > > What I'm proposing here is that where we currently have 2 types of upgrade > targets - AllMachines and StateServer - we introduce a third target called > MasterStateServer which would be primarily (exclusively?) used for schema > migration steps. > > >>> - Non-master JobManageEnviron machine agents run their upgrade steps as >>> usual and then watch for EnvironConfig changes. They don't consider the >>> upgrade to be complete (and therefore let their other workers start) until >>> database-version matches agent-version. This prevents the new version of >>> the state server agents from running before the schema migrations for the >>> new software version have run. >>> >> >> I'm not sure if schema should be done before or after other upgrade >> steps. Given we're really stopping the world here, it might be prudent to >> just wait to do your upgrade steps until you know that the DB upgrade has >> been done. >> > > As mentioned above, with what I'm thinking there is no real distinction > between schema migration steps and other types of upgrade steps so there's > no concept of schema migrations happening before or after other upgrade > steps. > > *Observations/Questions/Issues* >> >>> >>> - There are a lot of moving parts here. What could be made simpler? >>> >>> - What do we do if the master mongo database or host fails during the >>> upgrade? Is it a goal for one of the other state servers take over and run >>> the schema upgrades itself and let the upgrade finish? If so, is this a >>> must-have up-front requirement or a nice-to-have? >>> >> >> Some thoughts: >> > > >> 1. If the actual master mongo DB fails, that will cause reelection, which >> should cause all of the servers to get their connections to Mongo bounced, >> and then they'll notice that there is a new master who is responsible for >> applying the database changes. >> > > We will have to do some testing to ensure that this scenario actually > works. Maybe I'm over thinking it, but my gut says there's there's plenty > to go wrong here. > > 2. If it is just the master Juju process that fails, I don't think there >> is any great expectation that a different process running the same code is >> going to succeed, is there? >> > > Agreed. > > >> 3. There is also a fair possibility that the schema migration we've >> written won't work with real data in the wild. (we assumed this field was >> never written, but suddenly it is, etc). We've talked about the ability to >> have Upgrade roll back, and maybe we could consider that here. Some >> possible steps are: >> >> >> 1. Copy the db to another location >> 2. Try to apply the schema updates (either in place or only to the >> backup) >> 3. If upgrade fails, roll back to the old version, and update the >> AgentVersion in environ config so that the other agents will try to >> "upgrade" themselves back to the old version. This would also be a reason >> to do the DB schema before actually applying any other upgrade steps. We >> probably want some sort of "could not upgrade because of" tracking here, >> so >> that it can be reported to the user >> >> > I like this and it should work as long as there's enough storage available > to make a copy of the database. I'm not exactly clear on how we would > revert to the backup instance if the migration fails but I'm sure this can > be made to work. It might be enough for the first iteration if we initially > make some kind of backup that the user has access to that they can restore > from manually. > > As you mention, this would benefit from the DB schema steps being separate > from the other upgrade steps. I have no real issue with this other than > having them separate will probably mean more change to the existing > upgrades package. This voids some of the things I've said earlier in this > email :-) I'll think some more about how this could look. > > 4. As long as we do some sort of "backup before applying the change" we >> allow users a way to recover the system if something failed. If we have >> proper Backup support integrated into core, one option is that we just >> trigger a backup and then upgrade in place, if stuff breaks, we at least >> have *something* that should be recoverable. >> > > It's a pity that the full Backup feature isn't there yet as this could be > a nice way to get a first version of schema migrations working quickly. > >> >> >> >>> - Upgrade steps currently have access to State but I think this probably >>> won't be sufficient to perform many types of schema migrations (i.e. >>> accessing defunct fields, removing fields, adding indexes etc). Do we want >>> to extend State to provide a number of schema migration helpers or do we >>> expose mongo connections directly to the upgrade steps? >>> >> >> I believe the existing Upgrade logic actually has access to the API not >> to State itself, so we'll need something there. The State object has raw >> mongo collections on it (environs, charms, etc). >> > > The existing upgrade logic has access to both the API and State (the > latter only on state machines obviously, that arg is nil otherwise) so > that's already done. > > >> DB Schema (IMO) inherently is going to be at the raw DB level, vs changes >> in the abstract objects. (I expect that it will be defined in terms of >> Apply this function to all entities in this collection, rather than iterate >> over Machine objects and set data on them.) >> I could be wrong, but it does seem like we'll want the syntax of db >> schema changes to be on mgo.Collection objects, and not on State objects. >> > > I completely agree that we need schema migrations to work in the mongodb > world and not via application level objects. Some schema migration tasks > just won't make sense at the application object level. > > State doesn't expose its mgo collections to the outside though so how > would a schema migration step interact with them, especially for tasks such > as adding new collections or indexes? Do we add a bunch of schema migration > helper methods on to State (e.g. AddCollection(), AddIndex(), > ApplyToCollection() etc) or do we add a single method which exposes the > mongo database object (clearly marked as exclusively there for use by > schema upgrade steps), or do we have schema migration steps pass a function > that takes a mongo DB object to act on? We already expose the mongo session > with MongoSession() so there is some precedent for this. > > >> >>> - There is a possibility that a non-master state server won't upgrade, >>> blocking the master from completing the upgrade. Should there be a timeout >>> before the master gives up on state servers upgrading themselves and >>> performs its own upgrade steps anyway? >>> >> >> Arguably this is a better case for "rollback" than "just move forward". >> > > Ok - sounds good. > > >> >> >>> >>> - Given the order of documents a juju system stores, it's likely that >>> the schema migration steps will be quite quick, even for a large >>> installation. >>> >>> >> "order of magnitude" right? >> > > Yes - sorry that wasn't very clear. > > >> Yeah, we're talking megabytes, GB being really large, not many GB of data. >> > > Great. > > Thanks for the excellent feedback. > > - Menno > >
-- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
