On 19/03/16 01:02, Stuart Bishop wrote: > On 9 March 2016 at 10:51, Mark Shuttleworth <[email protected]> wrote: > >> Operational concerns > I still want 'juju-wait' as a supported, builtin command rather than > as a fragile plugin I maintain and as code embedded in Amulet that the > ecosystem team maintain. A thoughtless change to Juju's status > reporting would break all our CI systems.
Hmm.. I would have thought that would be a lot more reasonable now we have status well in hand. However, the charms need to support status for it to be meaningful to the average operator, and we haven't yet made good status support a requirement for charm promulgation in the store. I'll put this on the list to discuss. >> Core Model > At the moment logging, monitoring (alerts) and metrics involve > customizing your charm to work with a specific subordinate. And at > deploy time, you of course need to deploy and configure the > subordinate, relate it etc. and things can get quite cluttered. > > Could logging, monitoring and metrics be brought into the core model somehow? > > eg. I attach a monitoring service such as nagios to the model, and all > services implicitly join the monitoring relation. Rather than talk > bespoke protocols, units use the 'monitoring-alert' tool send a JSON > dict to the monitoring service (for push alerts). There is some > mechanism for the monitoring service to trigger checks remotely. > Requests and alerts go via a separate SSL channel rather than the > relation, as relations are too heavy weight to trigger several times a > second and may end up blocked by eg. other hooks running on the unit > or jujud having been killed by OOM. > > Similarly, we currently handle logging by installing a subordinate > that knows how to push rotated logs to Swift. It would be much nicer > to set this at the model level, and have tools available for the charm > to push rotated logs or stream live logs to the desired logging > service. syslog would be a common approach, as would streaming stdout > or stderr. > > And metrics, where a charm installs a cronjob or daemon to spit out > performance metrics as JSON dicts to a charm tool which sends them to > the desired data store and graphing systems, maybe once a day or maybe > several times a second. Rather than the current approach of assuming > statsd as the protocol and spitting out packages to an IP address > pulled from the service configuration I'm pretty comfortable with logging, in this list. The others make me feel like we'd require modification of the monitoring stuff anyhow, from the vanilla tools people have today. Logging is AFAICT relatively standardised, so I can see us setting loggin policy per model or per application, and having the agents do the right thing. >> There is also interest in being able to invoke actions across a relation >> when the relation interface declares them. This would allow, for example, a >> benchmark operator charm to trigger benchmarks through a relation rather >> than having the operator do it manually. > This is interesting. You can sort of do this already if you setup ssh > so units can run commands on each other, but network partitions are an > issue. Triggering an action and waiting on the result works around > this problem. > > For failover in the PostgreSQL charm, I currently need to leave > requests in the leader settings and wait for units to perform the > requested tasks and report their results using the peer relation. It > might be easier to coordinate if the leader was able to trigger these > tasks directly on the other units. Yes. On peers it should be completely uncontroversial since these are the same charm and, well, it should always work if the charm developer tested it :) The slightly controversial piece comes on invocation of actions across a relation, because it starts to imply that a different charm can't be substituted in on the other side of the relation unless it ALSO implements the actions that this charm expects. > Similarly, most use cases for charmhelpers.coordinator or the > coordinator layer would become easier. Rather than using several > rounds of leadership and peer relation hooks to perform a rolling > restart or rolling upgrade, the leader could trigger the operations > remotely one at a time via a peer relation. Right. I'll take that as a +1 from you then :) >> Storage >> >> * shared filesystems (NFS, GlusterFS, CephFS, LXD bind-mounts) >> * object storage abstraction (probably just mapping to S3-compatible APIS) >> >> I'm interested in feedback on the operations aspects of storage. For >> example, whether it would be helpful to provide lifecycle management for >> storage being re-assigned (e.g. launch a new database application but reuse >> block devices previously bound to an old database instance). Also, I think >> the intersection of storage modelling and MAAS hasn't really been explored, >> and since we see a lot of interest in the use of charms to deploy >> software-defined storage solutions, this probably will need thinking and >> work. > Reusing an old mount on a new unit is a common use case. Single unit > PostgreSQL is simplest here - it detects an existing database is on > the mount, and rather than recreate it fixes permissions (uids and > gids will often not match), mounts it and recreates any resources the > charm needs (such as the 'nagios' user so the monitoring checks work). > But if you deploy multiple PostgreSQL units reusing old mounts, what > do you do? At the moment, the one lucky enough to be elected master > gets used and the others destroyed. > > Cassandra is problematic, as the newly provisioned units will have > different positions and ranges in the replication ring and the > existing data will usually actually belong to other units in the > service. It would be simpler to create a new cluster, then attach the > old data as an 'import' mount and have the storage hook load it into > the cluster. Which requires twice the disk space, but means you could > migrate a 10 unit Cassandra cluster to a new 5 unit Cassandra cluster. > (the charm doesn't actually do this yet, this is just speculation on > how it could be done). I imagine other services such as OpenStack > Swift would be in the same boat. Yes, broadly speaking it seems the semantics of the old and the new service with the old mounts are very app specific. I don't have any brilliant ideas for clean syntax on this front yet :) Mark -- Juju mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
