Hello Cloud-init -- Thanks to all who attended the cloud-init summit this past week. We thank you for the participation. I wanted to follow up on list with the notes from the event, included inline in the body of this email. Please feel free to reply with any questions.
(I've included everyone who attended on bcc, in case you are wondring how you got this.) One request I have for people is to join the cloud-init mail list. It's very low volume. You can join with a Launchpad account here: https://launchpad.net/~cloud-init Here are the notes: ------------------- cloud-init Summit 2017 Notes [[TOC]] # Links * **Hangouts link**: [https://g.co/hangout/google.com/cloud-init][0] * [Shared Notes][1] [[https://goo.gl/tngepy][2]] (**This doc**) * [Welcome Slides & Agenda][3] [[https://goo.gl/zgw4Ug][4]] * [Pre Summit Bug List][5] [[https://goo.gl/QQfXE6][6]] * [Cloud-init Trello Roadmap][7] [[https://trello.com/b/W1LTVjQG/cloud-init-roadmap][8]] * [Cloud-init Trello Daily ][9][[https://trello.com/b/hFtWKUn3/daily-cloud-init-curtin]][10] * Copr repos: [https://copr.fedorainfracloud.org/coprs/g/cloud-init/cloud-init-dev/][11] [0]: https://g.co/hangout/google.com/cloud-init [1]: https://goo.gl/tngepy [2]: https://goo.gl/tngepy [3]: https://goo.gl/zgw4Ug [4]: https://goo.gl/zgw4Ug [5]: https://goo.gl/QQfXE6 [6]: https://goo.gl/QQfXE6 [7]: https://trello.com/b/W1LTVjQG/cloud-init-roadmap [8]: https://trello.com/b/W1LTVjQG/cloud-init-roadmap [9]: https://trello.com/b/hFtWKUn3/daily-cloud-init-curtin [10]: https://trello.com/b/hFtWKUn3/daily-cloud-init-curtin] [11]: https://copr.fedorainfracloud.org/coprs/g/cloud-init/cloud-init-dev/ # Main Sessions, Day 1 -- Thursday, Aug 24, 2017 ## Attendees Info * Steve Zarkos (Azure) [stevez] * Daniel Sol (Azure PM Azure Linux - azurelinuxagent) / [email protected] * Stephen Mathews (Softlayer) * Scott Moser (Canonical) [smoser] US/Eastern * Sankar Tanguturi (VMware @stanguturi) - Want to replace home grown configuration engine we want to replace w/ cloud-init * Robert Scheiwkert (SUSE) - Technical team lead for public cloud team, carry lots of patches to cloud-init that we want to upstream. [robjo / [[email protected]][12] ] * Andrew Jorgensen (AWS and Amazon Linux) [ajorg / ajorgens] * Zach Marano (GCE) - GCE Guest OS Images Lead [[[email protected]][13], TZ=US/Seattle] * Max Illfelder (GCE) - GCE Guest env author [[[email protected]][14], TZ=US/Seattle] * Ryan Harper (Canonical) [rharper] [TZ=US/Chicago] * Scott Easum (Softlayer) * [Robert Jennings][15] (Canonical) cloud image delivery [rcj@irc/[launchpad][16], rcj4747@elsewhere TZ=US/Chicago] * Matt Yeazel (AWS - Amazon Linux team) [yeazelm] * David Britton (Canonical server team mgr) [dpb1] TZ=US/Mountain * Josh Powers (Canonical server team eng) [powersj] TZ=US/Pacific * Chad Smith (Canonical cloud-init eng) - blackboxsw TZ=US/Mountain * Paul Meyer (Azure Linux) [paulmey@irc,github,microsoft.com pfameyer@twitter] * Lars Kellogg-Stedman (Red Hat), larsks@(irc,github,twitter, etc.), [[email protected]][17] * Ryan McCabe (RH) - rmccabe [12]: mailto:[email protected] [13]: mailto:[email protected] [14]: mailto:[email protected] [15]: mailto:[email protected] [16]: https://pad.lv/~rcj [17]: mailto:[email protected] ## Recent Features / Roadmap / Q&A [David Britton] * [https://goo.gl/zgw4Ug][18] * Lars: version numbering (e.g. .1 releases hard to sell) * Lars: external scripts versus inside cloud-init… moving ds-identify knowledge back into cloud-init. * Lars: slight concern about netplan as primary format w/ special handling logic * Lars: There are webhooks in COPR that could trigger per-commit builds for CI. Canonical’s CI does per-commit testing. Details in Josh’s topic * Lars: OpenStack surfaces 3rd-party CI mechanisms, can we do this with cloud-init upstream CI? * Smoser: Lot’s of interest from the community in querying metadata (like instance-id), it might be nice for cloud-init to provide that information. * Lars: want to cleanly separate collection of data from cloud-init from acting on that metadata. I want a tool to dump my cloud X’s metadata in a unified/standard format (trello card: [https://trello.com/c/AYaCdQyT][19]) * Robert: might we look at integrating existing tooling for metadata parsing? GCEmetadata [https://github.com/SUSE/Enceladus/tree/master/gcemetadata/][20] AWSMetadata [https://github.com/SUSE/Enceladus/tree/master/ec2utils/ec2metadata][21] * Paul/Stephen: Want to discuss eventing / hotplug operations * Lars interested in KVM CI supporting multiple distros [18]: https://goo.gl/zgw4Ug [19]: https://trello.com/c/AYaCdQyT [20]: https://github.com/SUSE/Enceladus/tree/master/gcemetadata/ [21]: https://github.com/SUSE/Enceladus/tree/master/ec2utils/ec2metadata ## cloud-init integration overview [Josh Powers] * [https://goo.gl/vbGrjY][22] * Integration tests design original doc: [https://goo.gl/qVhSrq][23] * CI injects the tested cloud-init into an image, boots the image with provided test cloud-config content and runs collect scripts after cloud-init runs. * Harvested output comes back to the host CI server, where nose is run, to get processed for expected test output * We obtain images and create customized lxc snapshots from [http://cloud-images.ubuntu.com/][24] * Jenkins CI view - [https://jenkins.ubuntu.com/server/view/cloud-init/][25] * Robert (SUSE) - Are tests consumable by others. They have an Image Proofing App tool ([IPA][26]) which wraps custom unit tests needs to drive unit tests with custom configs or operations (like restarting or rebooting instances). If cloud-init tests are available as a separate consumable package, the IPA tool could source tests and augment the tests or images as needed [https://github.com/SUSE/ipa][27] * Paul mentions that Azure has a test framework that they use for cloud-init testing and how it interacts with their WAL agent. * Smoser: Integration test wants, on LXC we can jump in out-of-band (sans SSH) to validate instances, but cloud/kvm deployments need an ssh-key setup for accessing the instance under test. [22]: https://goo.gl/vbGrjY [23]: https://goo.gl/qVhSrq [24]: http://cloud-images.ubuntu.com/ [25]: https://jenkins.ubuntu.com/server/view/cloud-init/ [26]: https://github.com/SUSE/ipa [27]: https://github.com/SUSE/ipa ## Decreasing Boot Time Overview [Ryan Harper] * [https://goo.gl/92ghBa][28] * Boot stages: [in tree docs][29] * TZ in systemd: [hacker news post][30] * When was the locale generating fix landed: rharper "now". Daily cloud-images contain updated cloud-init and fixed pre-generate locale. Stable, releases of Xenial and Zesty will follow on after an Ubuntu SRU * smoser: readurl … that would good to have logged each url request and then the ability to show all the urls read and times for this. * And existing method/function can be wrapped with util.log_time to generate granular events which could be interpreted by cloud-init analyze * Lars: might want a function decorator to facilitate that more easily * Want an optional mechanism to turn on deeper analysis (like strace collection from all cloud-init actions) * Don’t want to impact all cloud-init runs w/ analysis * RE: The execve() analysis, would be nice to be able to optionally group by positional arguments as well * For module use slide: Ryan was going to collect the module import flame graphs w/ snakefood to track whether we can distill python modules to the minimal set required for cloud-init boot-stage functionality. Less file imports == faster cloud-init * Currently cloud-init busted in trunk :-( ([http://paste.ubuntu.com/25384304/][31]) * LP: #[1712676][32] * Do we track/count # of touches of the metadata service * Azure: We want get_data analyze of datasource.get_data calls * Robert: What is the total time of cloud-init during the boot? (basically, what is systemd analyze blame overall?) - It is significant, some things we can improve, some things we can’t. The goal of this project is to give us the data, not to improve every tiny performance problem in cloud-init we can. * Smoser: Long-term we want to run analyze trend analysis against previous CI runs to watch for negative impacts to bringup * Amazon: module trim would be compelling as some disks are remote to instances and some of significant slowness can be introduced in init-local timeframe due to remote file loads * Ryan/dpb: might look at lazy loading modules only when needed, or prune most significant module from each cloud-init stage to trim it a bit. [28]: https://goo.gl/92ghBa [29]: http://cloudinit.readthedocs.io/en/latest/topics/boot.html [30]: https://news.ycombinator.com/item?id=13697555 [31]: http://paste.ubuntu.com/25384304/ [32]: https://pad.lv/1712676 ## Cloud-init schema validation [Chad Smith] * [https://goo.gl/jm7Tec][33] * As user’s primary interface with cloud-init, look at how to improve that user experience. Catch errors and issues earlier. * Each module has schema defined in them * Path to get the schema validated in the modules: hope is to build in unit testing (which will be run by CI) to exercise each key that was added such that they are all tested. * Lars: merging data - how are we validating the merging of a variety of user-data, are you getting the data you expect? Suggestion: Document merging behavior, and show a demo/example of how to test that. * [https://cloudinit.readthedocs.io/en/latest/topics/merging.html][34] * Potential subcommand for merge tool * Amazon: Does this only look at yaml files, currrently "yes", See above for our intent for a “merge” subcommand which will perform merges of vendor_data with all user_data parts to show the coalesced cloud-config object. * Lars: we want to see merging of vendor_data substrate/metadata etc to see if the mechanism for behaving the way we expect it to * RobertS: image vendors would like to use a tool to provide known vendor_data, user-data, etc to ensure expected behavior. That is an evolution over just syntactic correctness. * Robert: Not being able to overwrite certain keys might not be such a bad thing? * Differing opinions on this around the room * Robert: The web service could have a catalog of default vendor configuration, so they could see the assembled file and check it all for correctness. [33]: https://goo.gl/jm7Tec [34]: https://cloudinit.readthedocs.io/en/latest/topics/merging.html ## Version Numbering [Lars Kellogg-Stedman] * Flags for the consumer of the product which indicates stability and delta(risk) * 0.7.6 -> 0.7.9 completely different products, minor revision doesn’t really * Looking for deliberate/explicit version numbering schema * Proposed: usage of [semantic versioning][35] * smoser: focused on backwards compatibility: therefore we should be moving up the minor version instead of patch version for new features as it has been in the past. * Proposal: Roll over to 1.0.0 and start using semantic versioning * [Given a version number MAJOR.MINOR.PATCH, increment the:][36] * MAJOR version when you make incompatible API changes, * MINOR version when you add functionality in a backwards-compatible manner, and * PATCH version when you make backwards-compatible bug fixes. * Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format. * Lars: Having an established release schedule would also be very helpful. Distributions can better package releases around a schedule. Makes it easier to justify pulling in fixes. * Lars: Development model using branches make sense to utilize as well? E.g. having master freeze before a release, and a devel branch for pushing new things in the meantime. * Instead hide new features under command line and turn on formally once tested (Lars likes this) * Certain vendors carry large patch loads, how and where do we host those patches long-term? [35]: http://semver.org/ [36]: http://semver.org/spec/v2.0.0.html ## external scripts vs in-program [Lars Kellogg-Stedman] * ds-identify feels like duplication of data source discovery is there a way to roll that logic into the Datasource so that there isn’t that duplication? * The crux of the matter was execution time of python loading in all cloud-init modules on slow disk systems like raspi, etc. Compared to dash (for what is in effect simple reads of /proc, /sys, etc). * Ds-identify is run as a generator and only enables a datasource to run if the underlying substrate is compatible(discovered) for that datasource. * [ACTION] RCJ: Side issue, can unit test add shellcheck for ds-identify? ## cloud-init QA: Provider tests [Josh] * [https://goo.gl/2BHACr][37] * Want determine best specific vendor/cloud image testing integrated somehow up into upstream * [ACTION] Softlayer will grab information on CI to present tomorrow * Softlayer: Provision - create a template and then start an instance * RobertS: If tests are consumable he’d would run them on opensuse images and ensure they publish results. Integration tests running against clouds should have a framework which supports test result validation for different distro/image vendor expectations. * Looking for best practices for interacting with a given cloud: * How best to boot a custom image * [Action - Josh] - Put together an email with requests, and send to cloud providers describing how to run tests and ask for best practices or merge proposals for remote image testing * KVM merge proposal: [https://code.launchpad.net/~powersj/cloud-init/+git/cloud-init/+merge/327646][38] * Full integration test run times: LXD 12 minutes, KVM haven’t looked at total time cost yet (test merge proposal in flight^) [37]: https://goo.gl/2BHACr [38]: https://code.launchpad.net/~powersj/cloud-init/+git/cloud-init/+merge/327646 ## cloud-init QA: Distro Tests, how to build your own CI [Josh] * [https://goo.gl/dzQRHg][39] * Questions: * Lars: There are webhooks in COPR that could trigger per-commit builds for CI. Canonical’s CI does per-commit testing. Details in Josh’s topic [39]: https://goo.gl/dzQRHg ## Python 3 [Robert Schweikert] * [https://pythonclock.org/][40] * [https://www.python.org/dev/peps/pep-0394/][41] * From a distribution perspective, SLES11 (old) on python 2.6.9 and SLES 11 is going EOL in March 2018. And python2 going EOL in 2 years 7 months, see pythonclock) meaning likely that clouds/distros might start dropping support for py2 * SLES 15 will be python 3 next year with python2 in the legacy module with 2 years of support * At some point python2 support will be a "don’t care" for most distros & clouds. * Lars: RHEL6 may have a longer lifecycle than SLES but cloud-init is pinned at an older version of cloud-init, so upstream cloud-init could drop 2.6 shortly * Lars: RHEL7 still only has python 2.7 so there RHEL still cares about 2.7 for a while (June 2024) * smoser: python2.7 is probably present until RHEL7 has python3. Nobody in cloud-init summit cares about 2.6 support. * SLES: Today distro vendors have an increased QA support matrix to validate python 2.6 versus python 3.0 support. SLES11-12 separated distro into modules which have different update/backport policies. The cloud module (Robert’s group) have CI exception/agreement for updates so cloud-init can be moved to new versions as needed. * [AGREED] Python 2.6 support limited to ~18 months, 2.7 will continue for RHEL7 for a while unless RHEL7 can pull in a python3 version. * [ACTION - Lars] Determine whether python3 support will be introduced in RHEL7 at some point, or only RHEL8 [40]: https://pythonclock.org/ [41]: https://www.python.org/dev/peps/pep-0394/ ## Using LXD for rapid dev/testing [Scott] * [https://goo.gl/3sJuX9][42] * Getting started: [https://linuxcontainers.org/lxd/getting-started-cli/][43] * Images: [https://us.images.linuxcontainers.org/][44] * Ubuntu Daily Images: [https://cloud-images.ubuntu.com/daily/][45] * Stephane’s DebConf presentation [https://debconf17.debconf.org/talks/53/][46] * Good overview of basics and some advanced features * Scott: Would be really nice to have other OS images with cloud-init already in them like how the Ubuntu daily images do. Makes running cloud-init development with them much easier. * It would be nice if distro vendors interested in cloud-init could provide "official" LXD images for their distributions. * How do we create an LXD image? [tarball including rootfs + metadata or squashfs][47] * RobertS: might be able to teach kiwi build service to publish lxd images. If there is an images endpoint, LXD could crawl it etc. * Can we serve images from our own endpoint? Yes, either by implementing the [REST API][48] or by providing a simplestreams index * Lars: Thoughts about mocking/faking metadata service for quick testing? * Would love contributions of mock metadata services from cloud providers! * Chad: Serve up an instance on a cloud, harvest metadata information, and then use that data for serving up to tests. * smoser/lars: use docker official images for rhel/centos testing [42]: https://goo.gl/3sJuX9 [43]: https://linuxcontainers.org/lxd/getting-started-cli/ [44]: https://us.images.linuxcontainers.org/ [45]: https://cloud-images.ubuntu.com/daily/ [46]: https://debconf17.debconf.org/talks/53/ [47]: https://github.com/lxc/lxd/blob/master/doc/image-handling.md [48]: https://github.com/lxc/lxd/blob/master/doc/rest-api.md ## How to query metadata [Scott] * [https://trello.com/c/AYaCdQyT/21-cloud-init-query-standardized-information][49] * Might be nice for cloud-init to surface metadata since it crawls it for most data sources so that other tools don’t have to do that as well. * Datasources currently crawl and react and cache metadata in a pickled object on the filesystem, we would like to query cloud-init for the cached (or live) metadata and ultimately produce a unified JSON structure on the filesystem to allow other tools to parse metadata. * Originally there existed ‘cloud-init query’, not implemented * AndrewJ: potentially dump standard format keys and custom "raw" content within the same structure * smoser: might have security concerns about leaking sensitive information if we dump in a single blob, maybe we’d like to separate * Why pickle? keeps the class on disk, so cloud-init local can check instance_id for validation about whether cloud-init needs to be re-run * [ACTION - blackboxsw] Path forward: cloud-init supports pkl load a JSON load object and writes json instead of obj.pkl for new releases. When writing JSON, remove obj.pkl file * AndrewJ: Leveraging datasource get_data logic to handle retries or waits etc would be a big win for script writers so they don’t have to bake in that logic to their scripts. * [ACTION - blackboxsw] Datasource.crawl_metadata() branch * [ACTION - blackboxsw] Design schema specification for the unified metadata keys that cloud-init’s JSON object * Wants: Top-level cloud-type, some standard keys, an blob of cloud-specific keys and standard network config format [49]: https://trello.com/c/AYaCdQyT/21-cloud-init-query-standardized-information # Main Sessions, Day 2 -- Friday, Aug 25, 2017 ## Bug Triage & Squashing [All] * [https://goo.gl/QQfX][50][E6][51] [50]: https://goo.gl/QQfXE6 [51]: https://goo.gl/QQfXE6 ## Device hotplug overview & feedback [Ryan] * [https://goo.gl/WsBPkk][52] [52]: https://goo.gl/WsBPkk * Ajorg: There is no hierarchy BOOT/INSTANCE/ALWAYS, if manually running cloud-init --file config.yaml --frequency always, but the module is per-boot or per-instance, a sane hierarchy would prevent a module from running currently too rudimentary to handle that hierarchy as it’s only sem...<freq>. * GCE addresses this by documenting their interface and partitioning the config file in a way that makes clear the pieces which are Google managed; thus detection of changes by the user is possible and then tooling can decide if it should still manage the config file * In ssh authorized_keys files GCE adds dynamic metadata content with comment tag #Added by google… * For iptables Google tries to separate cloud changes from user-driven changes with namespace scoping network definitions "proto66" prefix * Need to take module case-by-case basis to determine whether idempotent runs will be expected behavior when determining whether to be re-entrant * Spawning a background hotplug service should be explicitly default disabled and ‘opt-in’ enabled by configuration if explicitly enabled in case upgrade environments may be running hot-plug configuration behavior already. * Openstack adds disk and nic info to metadata * AWS doesn’t surface updated disk info in metadata, but does add dynamic network info * AndrewJ: Computers are kittens vs. cattle * Andrew: One place where hotplug is interesting for him is if it were used to configure things all the time, not just when something was hotplugged, from a risk/maintenance/code-path perspective. * AWS has the [ec2-net-utils][53] tool (and [ubuntu port][54]) * SUSE project cloud-netconfig [https://github.com/SUSE/Enceladus/tree/master/cloud-netconfig][55] [53]: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#ec2-net-utils [54]: https://github.com/ademaria/ubuntu-ec2net [55]: https://github.com/SUSE/Enceladus/tree/master/cloud-netconfig ## Network v2 yaml as primary format [Ryan] * [Netplan config example][56] * Having a common intermediary (which has a spec) to represent network configuration makes unit testing easier as it’s a common spec that is published and understood (even outside of cloud-init) * Vmacs don’t seem to be defined in the spec, does that preclude the netplan spec from describing such features. Answer: It’s a fluid spec that is currently being extended, it’s a merge proposal away. * [Action - dpb] AndrewJ to follow up with Ryan & Pat about net-utils use cases for netplan [56]: https://git.launchpad.net/netplan/plain/doc/example-config ## Breakout JSON instance data storage for cloud-init part 1 [chad] * [https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/329589][57] * Bikeshedding of function names * Question around non-utf8 encoding and handling of that * How should we treat sensitive data (e.g. metadata) * Option: treat it as a white list, assume not allowed and only show exceptions? * Option: if root, get it all; if not root, don’t * Option: in the docs to avoid the phone call what is or isn’t missing * Initial cut will separate all user-data in a separate file from metadata which will only be readable by root. Will iterate on a path-based blacklist for known sensitive data and extract that out of the metadata blob into /run/cloud-init/instance-data-sensitive.json [57]: https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/329589 ## How can cloud-init improve & feedback [All] * GCE: Have we made progress on shutdown related actions? * Certain clouds have shutdown script needs (poor-man’s backups, scale-up/scale-down, etc) * No, this has not had progress. It’s in our backlog now. * (For Canonical, not cloud-init) - the Ubuntu user is problematic in many places. * Speed of language (python) vs others (golang) * Parallel execution of jobs; which would require defining/using a dependency chain * How can we get distros to maintain cloud-init better? Can’t seem to get distros to all be on the same page with distro support behavior * Finding source for cloud-init seems tough. having something where github contributions were allowed would be nice. * [ACTION] Come up with a proposal/procedure/tool to close pull requests and push to launchpad on the user’s behalf * Andrew: Can CentOS cloud-init align w/ RHEL? What’s with CentOS EPEL repos etc for cloud-init? * RobertS: Canonical needs to support infrastructure that facilitates distro support and contribution by separate interested party * RobertS: Feels CLA is major hindrance for SuSE, only two contributors allowed from SuSE and so two CLA-signers have to shepherd fixes in through those signed LP users. Legal dept is concerned about adding more users to a contribution list who have signed license rights away. Any time new contributors to the list, they have to talk to lawyers about the approval as well as higher level management for approval. * Concern is around how broad the CLA is compared to other CLAs * Feels like GPL in that it takes over other software developed * Andrew: previous CLA incarnations having reference to "interpreted according to British Law" kept US attorneys concerned. Changing to more of an apache license reduced concern. * "This Agreement will be governed by and construed in accordance with the laws of England" * Version numbering discussion is a big win for folks * Improved testing and integration CI is really helpful (good for surfacing systemd dependency trees etc) * Balance between being upstream project and packaging for various distributions. Up to the packager to know dependencies, etc. Therefore should project carry lots of distribution specific things? Or be very explicit about packaging (e.g. separate folders for each distro) * E.g. Unexpected magic found in seutp.py for people trying to make contributions or first look at the project. * E.g. templated spec files -- David Britton <[email protected]> -- Mailing list: https://launchpad.net/~cloud-init Post to : [email protected] Unsubscribe : https://launchpad.net/~cloud-init More help : https://help.launchpad.net/ListHelp

