On 9 March 2016 at 10:51, Mark Shuttleworth <[email protected]> wrote: > Operational concerns
I still want 'juju-wait' as a supported, builtin command rather than as a fragile plugin I maintain and as code embedded in Amulet that the ecosystem team maintain. A thoughtless change to Juju's status reporting would break all our CI systems. > Core Model At the moment logging, monitoring (alerts) and metrics involve customizing your charm to work with a specific subordinate. And at deploy time, you of course need to deploy and configure the subordinate, relate it etc. and things can get quite cluttered. Could logging, monitoring and metrics be brought into the core model somehow? eg. I attach a monitoring service such as nagios to the model, and all services implicitly join the monitoring relation. Rather than talk bespoke protocols, units use the 'monitoring-alert' tool send a JSON dict to the monitoring service (for push alerts). There is some mechanism for the monitoring service to trigger checks remotely. Requests and alerts go via a separate SSL channel rather than the relation, as relations are too heavy weight to trigger several times a second and may end up blocked by eg. other hooks running on the unit or jujud having been killed by OOM. Similarly, we currently handle logging by installing a subordinate that knows how to push rotated logs to Swift. It would be much nicer to set this at the model level, and have tools available for the charm to push rotated logs or stream live logs to the desired logging service. syslog would be a common approach, as would streaming stdout or stderr. And metrics, where a charm installs a cronjob or daemon to spit out performance metrics as JSON dicts to a charm tool which sends them to the desired data store and graphing systems, maybe once a day or maybe several times a second. Rather than the current approach of assuming statsd as the protocol and spitting out packages to an IP address pulled from the service configuration. > * modelling individual services (i.e. each database exported by the db > application) > * rich status (properties of those services and the application itself) > * config schemas and validation > * relation config > > There is also interest in being able to invoke actions across a relation > when the relation interface declares them. This would allow, for example, a > benchmark operator charm to trigger benchmarks through a relation rather > than having the operator do it manually. This is interesting. You can sort of do this already if you setup ssh so units can run commands on each other, but network partitions are an issue. Triggering an action and waiting on the result works around this problem. For failover in the PostgreSQL charm, I currently need to leave requests in the leader settings and wait for units to perform the requested tasks and report their results using the peer relation. It might be easier to coordinate if the leader was able to trigger these tasks directly on the other units. Similarly, most use cases for charmhelpers.coordinator or the coordinator layer would become easier. Rather than using several rounds of leadership and peer relation hooks to perform a rolling restart or rolling upgrade, the leader could trigger the operations remotely one at a time via a peer relation. > Storage > > * shared filesystems (NFS, GlusterFS, CephFS, LXD bind-mounts) > * object storage abstraction (probably just mapping to S3-compatible APIS) > > I'm interested in feedback on the operations aspects of storage. For > example, whether it would be helpful to provide lifecycle management for > storage being re-assigned (e.g. launch a new database application but reuse > block devices previously bound to an old database instance). Also, I think > the intersection of storage modelling and MAAS hasn't really been explored, > and since we see a lot of interest in the use of charms to deploy > software-defined storage solutions, this probably will need thinking and > work. Reusing an old mount on a new unit is a common use case. Single unit PostgreSQL is simplest here - it detects an existing database is on the mount, and rather than recreate it fixes permissions (uids and gids will often not match), mounts it and recreates any resources the charm needs (such as the 'nagios' user so the monitoring checks work). But if you deploy multiple PostgreSQL units reusing old mounts, what do you do? At the moment, the one lucky enough to be elected master gets used and the others destroyed. Cassandra is problematic, as the newly provisioned units will have different positions and ranges in the replication ring and the existing data will usually actually belong to other units in the service. It would be simpler to create a new cluster, then attach the old data as an 'import' mount and have the storage hook load it into the cluster. Which requires twice the disk space, but means you could migrate a 10 unit Cassandra cluster to a new 5 unit Cassandra cluster. (the charm doesn't actually do this yet, this is just speculation on how it could be done). I imagine other services such as OpenStack Swift would be in the same boat. -- Stuart Bishop <[email protected]> -- Juju mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
