Moving this to the list, in case others have input. Cheers, Andrew
On Wed, Sep 25, 2013 at 1:22 PM, Andrew Wilkins < [email protected]> wrote: > On Wed, Sep 25, 2013 at 10:54 AM, Tim Penhey <[email protected]>wrote: > >> On 25/09/13 08:15, William Reade wrote: >> > On Tue, Sep 24, 2013 at 11:12 AM, Andrew Wilkins >> > <[email protected] <mailto:[email protected]>> >> wrote: >> > >> > Hi William, Tim, >> > >> > I'm looking at adding a couple of new MachineJobs as requested, to >> > handle local-storage and firewaller. Here's what I'm thinking: >> > >> > - Add a "BootstrapMachineJobs" field to >> > environs/cloudinit.MachineConfig; if nil, set the current default in >> > cmd/jujud/bootstrap.go. This will be written to the bootstrap >> > agent.conf, and consumed by jujud bootstrap-state. >> > >> > Question: machine jobs, or something more like environment capability >> > flags? Because, really, that's why we need the custom jobs. >> > >> > * we should only firewall if the environment supports that. [0] >> > * we should only run http storage if the environment doesn't provide it >> > itself. >> > * plausibly, in the case of the null provider, we should actually not >> > even run an environ provisioner, and not bother to implement the >> > InstanceBroker methods. >> >> I think I agree with William here, these are more associated with a >> provider rather than just agent config. >> >> Null provider and local provider want to run http storage, but not >> firewallers. >> >> I also think it is fair that we shouldn't run an environ provisioner for >> the null provider. And with the addition of the precheck methods, there >> should never get to be a machine in state that would require the null >> provider to try. Let's not start what we don't need. >> > > SGTM > > >> > All the above are pieces of info about the environment we could record >> > clearly in state, and which should apply to any manager node we start in >> > an HA context. Furthermore, across even non-management nodes, we can >> > know it's not even worth bothering to run any non-environ provisioner if >> > the environment can't supply new addresses; if we've got environment >> > capabilities recorded in state, we can know what needs to be done at the >> > time of machine creation. >> >> I'm not even sure that this information needs to be in state. At least >> for the first cut of it. >> >> Also, we do have a problem with considering that having multiple environ >> provisioners as they are currently defined is going to cause race >> conditions on starting / stopping containers unless we add extra >> metadata to state so one provisioner doesn't try to stop a machine >> another is starting. Actually given the HA story, it is better to have >> two working collaboratively than a fail-over we have to manage. >> >> > This does *then* imply that the existing machine-creation methods are >> > themselves talking the wrong language: rather than specifying jobs >> > explicitly, we should be specifying... roles, maybe? ...and combining >> > roles with environment capabilities internally to state. >> >> Well, the whole point of the jobs listing for the machines IS a >> reflection of the roles that the machine has. We just need to have more >> fine-grained roles, rather than "manage everything", we add a few. >> > > I think that conceptually, "capability" makes sense for some things more > than job/role. In particular, "has the ability to manage firewalls" seems > better expressed as a capability than as a job. However, I don't think it's > really worthwhile changing code to match. A capability can be expressed as > a job, even if it's *slightly* awkward. The fact that we're giving a > machine-agent the job "ManageFirewall" implies that it has that capability. > > >> > - Update agent.Conf's format-1.16 to read this, and >> >> FYI - not that we need this, if you change format-1.16, you also need to >> change the migrate method, or put this in the attribute map. >> > > Yep. As discussed on IRC, this could just as well be done with the > key/value map. I kind of don't like adding required things into a key/value > map, but on on the other hand this is bootstrap-specific, and not something > the machine-agent proper cares about. Not changing the format is good, too. > > >> > - Update manual bootstrap to set machine jobs, including >> > JobHostEnvironStorage and excluding JobManageEnvironFirewall. >> > - Update local provider to add JobHostEnvironStorage job. >> > >> > >> > So if we have roles+capabilities, the machine agent stays nice and >> > simple -- we just inject a machine with the "manager" role, which then >> > gets its jobs calculated according to the environ's capabilities. But >> > ofc we do still have to inject the capabilities at cloud-init time. Bah >> ;). >> >> We don't need to add the capabilities to the config. We could add them >> to the information that the machine api gets back. However, since the >> machine agents don't know what the environment is, it takes us back to >> storing the roles (jobs) in state. >> >> > So there's a couple of things that need to happen on upgrade: >> > - For local provider, add the JobHostEnvironStorage job to machine >> > 0 if it doesn't have it. >> > - For non-local, non-null provider, add JobManageEnvironFirewall to >> > machine 0 if it doesn't have it. >> > >> > Is there existing code that does this? Where's appropriate? I know >> > there's agent.conf migration, but I don't think that's really >> > appropriate for this kind of upgrade. Environ.Validate could >> > potentially do this, by checking old/new tools versions, connecting >> > to state if it's machine 0 and making necessary changes. >> > >> > >> > We don't have good practice wrt upgrades. Given that the state package >> > is not completely insulated behind the API, and so we can never >> > guarantee that some agent or client is not going to swoop in and start >> > changing the database, we've just been making very tentative additions >> > and sometimes getting even those wrong. FWIW, the decision to upgrade >> > *is* now taken behind the API, so we have some degree of control we did >> > not before, but it's still not foolproof. >> >> We need a state side, server upgrade process defined. Enough of this >> ad-hoc jiggery-pokery. >> >> We also need a defined process for upgrades. I'm not sure how close we >> are to this right now, but I think we need something like this: >> >> 1) Put the API server into a state where it continues to serve requests, >> but doesn't accept new connections. >> 2) The tool version is updated causing all machine agents to kill >> themselves. >> 3) We need some form of state-side lock to allow only one state server >> to modify the underlying structure, and a defined process of functions >> to run to modify the state documents to the next version. [1] >> >> This process needs to be defined, and stable, such that we don't delete >> it all when the next minor branch commit is done. >> >> 4) When the state servers have been upgraded, we then kick off the api >> servers, which the machine agents can then connect to. >> > > This sounds sane to me. > > >> > I think it's probably simplest to do a one-shot post-upgrade job-update >> > operation in the machine agents that have jobs which are changing >> > meaning (for versions 1.15/1.16). The machine agents each have control >> > over their machine's documents, and they're the only things that react >> > interestingly to machine jobs regardless, so they're perfectly suited to >> > updating the documents; and the machine agents *are* where we apply the >> > hacks today, so it's quite convenient to have the same component make >> > the appropriate fixes to state before being retired for 1.17 and >> onwards. >> >> See above, and I don't think we should be retiring the code too soon. >> >> > So maybe: add and make an UpdateJobs API call, somewhere before we call >> > MachineAgent.APIWorker and get the Jobs we're expected to run, and >> > schedule the code to be deleted after 1.16; old code will still read the >> > jobs it expects, new code won't run until the additions have been made, >> > and everyone will be happy. I think. >> > >> > BTW, the idea of Environ.Validate connecting to state breaks my brain a >> > little, I'd very strongly prefer not to do that. >> > >> > Not sure if all that is helpful, or whether it just obscures things. >> > Ping me in the morning and we can talk if necessary. >> >> I also have something that is going to be needed to be installed on all >> machines as part of the upgrade procedure. >> >> New installs will have cpu-checker package installed, and will have done >> some rudimentary checks when the machine agent has come up, however we >> need a place to add new packages that are required to be installed, or >> new apt-sources defined (like the cloud-tools archive). >> >> Perhaps this whole piece of work fits under the "major version upgrades" >> headline, as once we have this process and procedure in place, major >> versions just become a number we may change periodically as any version >> may update the state document structure. >> > > Yep. After I sent the email yesterday, I began thinking that this upgrade > functionality is going to be exactly what's needed for updating the state > schema. I've got a few things to finish off (authenticated httpstorage is > half down; still need to document manual provisioning). Pending > cloud-installer work, I can start looking into this in a bit more detail. > > Vague ideas at the moment: > - Add a version to the state database (I suppose there'd need to be some > kind of metadata document collection), to track required schema changes. > - Add a state/upgrade package, which keeps a full history of > point-to-point schema updates required. We iterate through version changes, > applying upgrade steps one at a time. Everything must be done in a > transaction, naturally. > - One API server will (with a global lock): > * First upgrade the state database. All other code can be written to > assume the current database schema. > * Invoke an EnvironUpgrader interface method, optionally implemented by > an Environ. This interface defines a method for upgrading some > provider-specific aspects of the environment (e.g. going through and adding > jobs to all of the state-server machines). The EnvironUpgrader will > similarly need to keep track of versions, and point-to-point upgrades. > > [1] I think the stance of only supporting upgrades to the next public >> release is crackful. Consider a user that has a working juju install >> and has not needed to poke it for ages. They are on 2.2.0. They read >> the latest juju docs that show new amazing juju-gui stuff that requires >> 2.10.0. We should not make them go 2.2 -> 2.4 -> 2.6 -> 2.8 -> 2.10 as >> that is clearly a terrible end user experience. >> > > From a user POV, that sounds pretty horrible. >
-- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
