Thanks for getting this started. Diff comments:
> diff --git a/cloudinit/analyze/README.md b/cloudinit/analyze/README.md > new file mode 100644 > index 0000000..e496d74 > --- /dev/null > +++ b/cloudinit/analyze/README.md We'll want this in docs/rtd > @@ -0,0 +1,167 @@ > +Getting Started > +=============== > + > +You'll need a working LXD setup. If you can lxc launch a machine then you're > +good to go. > + > +1. lxc init ubuntu-daily:yakkety y1 Use xenial here > + > +# for now, we need to adjust cloud-init's logging to emit messages in > +# a more useful format for parsing. > +# > +# NOTE: you need mount-image-callback > +# from https://code.launchpad.net/~smoser/cloud-utils/trunk.mic-lxd > +2. sudo ./bin/inject-payload.sh lxd:y1 > + > +# now you can start it > +3. lxc start y1 > + We can remove this inject section as it's being integrated now > +# now you can analyze it: > +# % bin/cloudinit-analyze --help > +# usage: cloudinit analyze <subcommand> [<args>] > +# > +# Subcommands: > +# blame Print list of executed stages ordered by time to init > +# dump Print an in-order list of events in JSON format > +# show Print an in-order list of stages with execution times > +# > +# Analyze cloud-init logs and data > +# > +# positional arguments: > +# subcommand subcommand to run > +# > +# optional arguments: > +# -h, --help show this help message and exit > + > +# Print boot events, sections ran, unit time and total time, per boot > +4. lxc exec y1 -- journalctl-cloudinitlog | ./bin/cloudinit analyze show -i - We'll want to revert this to something like: lxc file pull y1/var/log/cloud-init.log - | cloud-init analyze show -i - > + > + > +# % $cmd | tail -n 20 > +# |`->config-byobu previously ran @01.48600s +00.00000s > +# Finished stage: (modules-config) 00.03300 seconds > +# > +# Starting stage: modules-final > +# |`->config-rightscale_userdata previously ran @01.92300s +00.00000s > +# |`->config-scripts-vendor previously ran @01.92300s +00.00100s > +# |`->config-scripts-per-once previously ran @01.92400s +00.00000s > +# |`->config-scripts-per-boot ran successfully @01.92400s +00.00100s > +# |`->config-scripts-per-instance previously ran @01.92500s +00.00000s > +# |`->config-scripts-user previously ran @01.92500s +00.00100s > +# |`->config-ssh-authkey-fingerprints previously ran @01.92600s +00.00000s > +# |`->config-keys-to-console previously ran @01.92600s +00.00000s > +# |`->config-phone-home previously ran @01.92700s +00.00000s > +# |`->config-final-message ran successfully @01.92700s +00.00600s > +# |`->config-power-state-change previously ran @01.93300s +00.00000s > +# Finished stage: (modules-final) 00.01800 seconds > +# > +# Total Time: 0.25700 seconds > +# > +# 2 boot records analyzed > + > + > +# dump json event stream parsted from cloud-init.log > +5. lxc exec y1 -- journalctl-cloudinitlog | ./bin/cloudinit-analyze dump -i - > + > +# { > +# "description": "config-power-state-change previously ran", > +# "event_type": "finish", > +# "name": "modules-final/config-power-state-change", > +# "origin": "cloudinit", > +# "result": "SUCCESS", > +# "timestamp": 1473722226.711 > +# }, > +# { > +# "description": "running modules for final", > +# "event_type": "finish", > +# "name": "modules-final", > +# "origin": "cloudinit", > +# "result": "SUCCESS", > +# "timestamp": 1473722226.713 > +# } > +# ] > + > + > +# sort events by time length (blame!) > +6. lxc exec y1 -- journalctl-cloudinitlog | ./bin/cloudinit-analyze blame -i > - > + > +# -- Boot Record 01 -- > +# 00.68700s (modules-config/config-locale) > +# 00.16800s (init-network/config-ssh) > +# 00.14600s (modules-config/config-grub-dpkg) > +# 00.07600s (modules-config/config-apt-configure) > +# 00.01900s (modules-final/config-keys-to-console) > +# 00.01900s (init-local/search-NoCloud) > +# 00.01500s (init-network/config-users-groups) > +# 00.00500s (modules-final/config-final-message) > +# 00.00500s (init-network/consume-vendor-data) > +# 00.00500s (init-network/config-growpart) > +# 00.00400s (init-network/config-set_hostname) > +# 00.00400s (init-network/config-resizefs) > +# 00.00300s (init-network/consume-user-data) > +# 00.00200s (modules-final/config-ssh-authkey-fingerprints) > +# 00.00200s (modules-final/config-scripts-per-instance) > +# 00.00200s (modules-final/config-phone-home) > +# 00.00200s (modules-config/config-snappy) > +# 00.00200s (modules-config/confg-mounts) > +# 00.00200s (modules-config/config-fan) > + > + > +Testing with KVM > +================ > + > +1. Download the current cloud image > + > +% > https://cloud-images.ubuntu.com/daily/server/yakkety/current/yakkety-server-cloudimg-amd64.img > + > +2. Create snapshot image to preserve the original cloud-image. > + > +% qemu-img create -b yakkety-server-cloudimg-amd64.img -f qcow2 > test-cloudinit.qcow2 > + > +3. Inject the cloud-init modifications. Note, this takes longer as it > installs > +profiling packages. > + > +% ./bin/inject-payload.sh ./test-cloudinit.qcow2 > + > +4. Create a seed image using `cloud-localds` > + Drop the integration/injection stuff > +% cat my-user-data > +#cloud-config > +password: passw0rd > +chpasswd: { expire: False } > +ssh_pwauth: True > +ssh_import_id: <gh:userid or lp:userid> > +% echo "instance-id: $(uuidgen || echo i-abcdefg)" > my-meta-data > +% cloud-localds my-seed.img my-user-data my-meta-data > + > +5. Launch your modifed VM > + > +% kvm -m 512 -net nic -net user -redir tcp:2222::22 \ > + -drive file=test-cloudinit.qcow2,if=virtio,format=qcow2 \ > + -drive file=my-seed.img,if=virtio,format=raw > + > +6. Analyze the boot (blame, dump, show) > + > +% ssh -p 2222 ubuntu@localhost 'cat /var/log/cloud-init.log' | > ./bin/./bin/cloudinit-analyze blame -i > + > +7. Profiling Data is present in > /var/lib/cloud/data/cloud-init-<TIMESTAMP>-<PROFILER> > + > +On KVM and other VM-based images, with profling enabled, blktrace IO and perf > +cpu stacks are recorded for later data analysis. > + > +# IO tracing on the cloud-image disk and seed device > +% ssh -p 2222 ubuntu@localhost > +% cd /var/lib/cloud-init/data/cloud-init-*-blktrace > +% sudo zcat vda1.blktrace.* | blkparse -i - -s > + > +# or for btt data > +sudo gunzip *blktrace* > +sudo blkparse -i vda1 -d bp.vda1.bin > +sudo btt -A -i bp.vda1.bin | sudo tee bp.vda1.txt > + > + > +# Profiling of all CPUs during cloud-init run > +% ssh -p 2222 ubuntu@localhost > +% cd /var/lib/cloud-init/data/cloud-init-*-perf > +% sudo gunzip perf.data.gz > +% sudo perf report --stdio > diff --git a/cloudinit/analyze/show.py b/cloudinit/analyze/show.py > new file mode 100644 > index 0000000..6b4c10c > --- /dev/null > +++ b/cloudinit/analyze/show.py > @@ -0,0 +1,241 @@ > +#!/usr/bin/env python3 Probablu should drop this > +# Copyright (C) 2016 Canonical Ltd. > +# > +# Author: Ryan Harper <ryan.har...@canonical.com> > +# > +# This file is part of cloud-init. See LICENSE file for license information. > +import argparse > +import base64 > +import datetime > +import json > +import os > +import sys > + > +# An event: > +''' > +{ > + "description": "executing late commands", > + "event_type": "start", > + "level": "INFO", > + "name": "cmd-install/stage-late" > + "origin": "cloudinit", > + "timestamp": 1461164249.1590767, > +}, > + > + { > + "description": "executing late commands", > + "event_type": "finish", > + "level": "INFO", > + "name": "cmd-install/stage-late", > + "origin": "cloudinit", > + "result": "SUCCESS", > + "timestamp": 1461164249.1590767 > + } > + > +''' > +format_key = { > + '%d': 'delta', > + '%D': 'description', > + '%E': 'elapsed', > + '%e': 'event_type', > + '%I': 'indent', > + '%l': 'level', > + '%n': 'name', > + '%o': 'origin', > + '%r': 'result', > + '%t': 'timestamp', > + '%T': 'total_time', > +} > + > +formatting_help = " ".join(["{}: {}".format(k.replace('%', '%%'), v) > + for k, v in format_key.items()]) > + > + > +def format_record(msg, event): > + for i, j in format_key.items(): > + if i in msg: > + # ensure consistent formatting of time values > + if j in ['delta', 'elapsed', 'timestamp']: > + msg = msg.replace(i, "{%s:08.5f}" % j) > + else: > + msg = msg.replace(i, "{%s}" % j) > + return msg.format(**event) > + > + > +def dump_event_files(event): > + content = {k: v for k, v in event.items() if k not in ['content']} > + files = content['files'] > + saved = [] > + for f in files: > + fname = f['path'] > + fn_local = os.path.basename(fname) > + fcontent = base64.b64decode(f['content']).decode('ascii') Maybe we can use cloudinit.util.load_file > + with open(fn_local, 'w') as fh: > + fh.write(fcontent) and util.write_file > + saved.append(fn_local) > + > + return saved > + > + > +def event_name(event): > + if event: > + return event.get('name') > + return None > + > + > +def event_type(event): > + if event: > + return event.get('event_type') > + return None > + > + > +def event_parent(event): > + if event: > + return event_name(event).split("/")[0] > + return None > + > + > +def event_timestamp(event): > + return float(event.get('timestamp')) > + > + > +def event_datetime(event): > + return datetime.datetime.utcfromtimestamp(event_timestamp(event)) > + > + > +def delta_seconds(t1, t2): > + return (t2 - t1).total_seconds() > + > + > +def event_duration(start, finish): > + return delta_seconds(event_datetime(start), event_datetime(finish)) > + > + > +def event_record(start_time, start, finish): > + record = finish.copy() > + record.update({ > + 'delta': event_duration(start, finish), > + 'elapsed': delta_seconds(start_time, event_datetime(start)), > + 'indent': '|' + ' ' * (event_name(start).count('/') - 1) + '`->', > + }) > + > + return record > + > + > +def total_time_record(total_time): > + return 'Total Time: %3.5f seconds\n' % total_time > + > + > +def generate_records(events, blame_sort=False, > + print_format="(%n) %d seconds in %I%D", > + dump_files=False, log_datafiles=False): > + > + sorted_events = sorted(events, key=lambda x: x['timestamp']) > + records = [] > + start_time = None > + total_time = 0.0 > + stage_start_time = {} > + stages_seen = [] > + boot_records = [] > + > + unprocessed = [] > + for e in range(0, len(sorted_events)): > + event = events[e] > + try: > + next_evt = events[e + 1] > + except IndexError: > + next_evt = None > + > + if event_type(event) == 'start': > + if event.get('name') in stages_seen: > + records.append(total_time_record(total_time)) > + boot_records.append(records) > + records = [] > + start_time = None > + total_time = 0.0 > + > + if start_time is None: > + stages_seen = [] > + start_time = event_datetime(event) > + stage_start_time[event_parent(event)] = start_time > + > + # see if we have a pair > + if event_name(event) == event_name(next_evt): > + if event_type(next_evt) == 'finish': > + records.append(format_record(print_format, > + event_record(start_time, > + event, > + next_evt))) > + else: > + # This is a parent event > + records.append("Starting stage: %s" % event.get('name')) > + unprocessed.append(event) > + stages_seen.append(event.get('name')) > + continue > + else: > + prev_evt = unprocessed.pop() > + if event_name(event) == event_name(prev_evt): > + record = event_record(start_time, prev_evt, event) > + records.append(format_record("Finished stage: " > + "(%n) %d seconds ", > + record) + "\n") > + total_time += record.get('delta') > + else: > + # not a match, put it back > + unprocessed.append(prev_evt) > + > + records.append(total_time_record(total_time)) > + boot_records.append(records) > + return boot_records > + > + > +def show_events(events, print_format): > + return generate_records(events, print_format=print_format) > + > + > +def load_events(infile, rawdata=None): > + if rawdata: > + data = rawdata.read() > + else: > + data = infile.read() > + > + j = None > + try: > + j = json.loads(data) > + except json.JSONDecodeError: > + pass > + > + return j, data > + > + > +def main(): > + parser = argparse.ArgumentParser( > + description=('cloudinit-print-log - ' Not sure if we need to drop or modify this. > + 'pretty print and sort cloudinit logs'), > + prog='cloudinit-print-log') > + parser.add_argument('--format', action='store', > + dest='print_format', > + default='%I (%n) %D @%Es +%ds', > + help='specify formatting of output. ' + > + formatting_help) > + parser.add_argument('infile', nargs='?', type=argparse.FileType('r'), > + help='Path to log to parse. Use - for stdin') > + > + opts = parser.parse_args(sys.argv[1:]) > + if not opts.infile: > + parser.print_help() > + sys.exit(1) > + > + j, _ = load_events(opts.infile) > + > + records = show_events(j, opts.print_format) > + for record in records: > + print('The total time elapsed since completing an event is printed' > + ' after the "@" character.') > + print('The time the event takes is printed after the "+" character.') > + print() > + print("\n".join(record)) > + > + > +if __name__ == '__main__': > + main() > diff --git a/cloudinit/cmd/main.py b/cloudinit/cmd/main.py > index 139e03b..8e02627 100644 > --- a/cloudinit/cmd/main.py > +++ b/cloudinit/cmd/main.py > @@ -72,18 +74,18 @@ LOG = logging.getLogger() > # and we still want to print exceptions... > def print_exc(msg=''): > if msg: > - sys.stderr.write("%s\n" % (msg)) > + sys.stderr.write('%s\n' % (msg)) > sys.stderr.write('-' * 60) > - sys.stderr.write("\n") > + sys.stderr.write('\n') > traceback.print_exc(file=sys.stderr) > sys.stderr.write('-' * 60) > - sys.stderr.write("\n") > + sys.stderr.write('\n') flag day for " -> ' ? > > > def welcome(action, msg=None): > if not msg: > msg = welcome_format(action) > - util.multi_log("%s\n" % (msg), > + util.multi_log('%s\n' % (msg), > console=False, stderr=True, log=LOG) > return msg > > @@ -713,83 +715,74 @@ def main(sysv_args=None): > default=False) > > parser.set_defaults(reporter=None) > - subparsers = parser.add_subparsers() > + subparsers = parser.add_subparsers(title='Subcommands', > dest='subcommand') > + subparsers.required = True > > # Each action and its sub-options (if any) > - parser_init = subparsers.add_parser('init', > - help=('initializes cloud-init and' > - ' performs initial modules')) > - parser_init.add_argument("--local", '-l', action='store_true', > - help="start in local mode (default: > %(default)s)", > - default=False) > - # This is used so that we can know which action is selected + > - # the functor to use to run this subcommand > + parser_analyze = subparsers.add_parser( > + 'analyze', help='Analyze cloud-init logs and data') > + # Construct analyze subcommand parser > + parser_analyze = analyze_parser(parser_analyze) > + > + parser_init = subparsers.add_parser( > + 'init', help='initializes cloud-init and performs initial modules') > + parser_init.add_argument( > + '--local', '-l', action='store_true', default=False, > + help='start in local mode (default: %(default)s)') > parser_init.set_defaults(action=('init', main_init)) > > # These settings are used for the 'config' and 'final' stages > - parser_mod = subparsers.add_parser('modules', > - help=('activates modules using ' > - 'a given configuration key')) > - parser_mod.add_argument("--mode", '-m', action='store', > - help=("module configuration name " > - "to use (default: %(default)s)"), > + parser_mod = subparsers.add_parser( There's a lot of style-refactoring here, but it's not consistent. It's really hard to pick out what's needed for analyze integration versus style changes. > + 'modules', help='activates modules using a given configuration key') > + parser_mod.add_argument('--mode', '-m', action='store', > + help=('module configuration name ' here, you indent at the open brace, but the init parser you indent one tab in, which was different from before, all of them were indented to the first brace. > + 'to use (default: %(default)s)'), > default='config', > choices=('init', 'config', 'final')) > parser_mod.set_defaults(action=('modules', main_modules)) > > # These settings are used when you want to query information > # stored in the cloud-init data objects/directories/files > - parser_query = subparsers.add_parser('query', > - help=('query information stored ' > - 'in cloud-init')) > - parser_query.add_argument("--name", '-n', action="store", > - help="item name to query on", > - required=True, > - choices=QUERY_DATA_TYPES) > + parser_query = subparsers.add_parser( > + 'query', help='query information stored in cloud-init') > + parser_query.add_argument( > + '--name', '-n', action='store', help='item name to query on', > + required=True, choices=QUERY_DATA_TYPES) here is one tab in, pulled in from first brace > parser_query.set_defaults(action=('query', main_query)) > > # This subcommand allows you to run a single module > - parser_single = subparsers.add_parser('single', > - help=('run a single module ')) > - parser_single.add_argument("--name", '-n', action="store", > - help="module name to run", > - required=True) > - parser_single.add_argument("--frequency", action="store", > - help=("frequency of the module"), > - required=False, > - choices=list(FREQ_SHORT_NAMES.keys())) > - parser_single.add_argument("--report", action="store_true", > - help="enable reporting", > - required=False) > - parser_single.add_argument("module_args", nargs="*", > - metavar='argument', > - help=('any additional arguments to' > - ' pass to this module')) > + parser_single = subparsers.add_parser( > + 'single', help='run a single module') that looks like it could fit on one line even > + parser_single.add_argument('--name', '-n', action='store', required=True, > + help='module name to run') this aren't one tab it, they're out to first brace. If we're going to do all of this code-motion, we should pick something style-wise and be consistent. And preferrably in it's own MP > + parser_single.add_argument('--frequency', action='store', > + choices=list(FREQ_SHORT_NAMES.keys()), > + help='frequency of the module') > + parser_single.add_argument('--report', action='store_true', > + help='enable reporting') > + parser_single.add_argument('module_args', nargs='*', metavar='argument', > + help=('any additional arguments to pass to ' > + ' this module')) > parser_single.set_defaults(action=('single', main_single)) > > - parser_dhclient = subparsers.add_parser('dhclient-hook', > - help=('run the dhclient hook' > - 'to record network info')) > - parser_dhclient.add_argument("net_action", > - help=('action taken on the interface')) > + parser_dhclient = subparsers.add_parser( > + 'dhclient-hook', help='run the dhclient hook to record network info') > + parser_dhclient.add_argument('net_action', > + help='action taken on the interface') > parser_dhclient.add_argument("net_interface", > help=('the network interface being acted' > ' upon')) > parser_dhclient.set_defaults(action=('dhclient_hook', dhclient_hook)) > > - parser_features = subparsers.add_parser('features', > - help=('list defined features')) > + parser_features = subparsers.add_parser( > + 'features', help='list defined features') > parser_features.set_defaults(action=('features', main_features)) > > args = parser.parse_args(args=sysv_args) > > - try: > - (name, functor) = args.action > - except AttributeError: > - parser.error('too few arguments') > + (name, functor) = args.action why do we get to try the try block here? > > # Setup basic logging to start (until reinitialized) > - # iff in debug mode... is it true we're always configuring logging or did we just accidentally drop this comment? > if args.debug: > logging.setupBasicLogging() > -- https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/328819 Your team cloud-init commiters is requested to review the proposed merge of ~chad.smith/cloud-init:analyze into cloud-init:master. _______________________________________________ Mailing list: https://launchpad.net/~cloud-init-dev Post to : cloud-init-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~cloud-init-dev More help : https://help.launchpad.net/ListHelp