Ian Jackson <[email protected]> [13/Mar 8:45pm GMT] wrote: > 1. Extraneous copy of the key in tag2upload-builder-01:~tag2upload-builder > > The image rebuilder script copies the public key from its ~ > into ~builder in the image. > > AFAICT there is no other reason for ~tag2upload-builder to contain > a copy of the public key. It would probably be better if the > image rebuilder script got the key from somewhere else, and we > deleted the copy in ~tag2upload-builder.
ACK, sounds good. Also we'd want to update the instructions saying to put a copy of the key there. > 2. We did not detect this impending breakage. > > In my experience, this kind of lossage is very common in systems > involving expiry times. It's all very well having a cron job detect > the need for renewal, and a human process that is supposed to sort the > thing out, but IME that is usually not sufficient. Such things are > IME prone to various kinds of ailure. A backstop is needed. > > IMO we should have separate checks for each place a copy of the key > exists, that alert for any failures or omissio of the key expiry > arrangements. I don't mind more cron jobs/e-mails so long as they are deployed cleanly and are consistent with each other. > Based on my experience maintaining systems where things can expire > (eg, Let's Encrypt certificates, domain names) I suggest: > > * Every location which has a copy of the key, that anything relies on, > should be checked daily by cron. I think the locations where > an expired key would break the system, or break downstreams, are > - tag2upload builder VM (~builder) > - oracle (~tag2upload-oracle) > - Debian archive tag2upload keyring .deb > - dak > - dgit-repos > - the copy on the wiki > > * There should be one central cron job which is responsible for > sending an email at least once a day if any of these are going to > expire within the next (say) 21 days. > > * The site-specific information could be collected by push or by pull. > For example, supposing the central copy is on the manager, oracle > and builder could ssh to the manager daily to deposit copies of > their keys. A cron job on the manager could wget the wiki. > > * Arguably the cron job which sends the emails should *not* run on the > manager, since the manager already has the normal cron job that is > supposed to prompt us to do the updates. If for some reason cron > jobs on the manager don't run or can't email us, we'd miss the memo. > > The daily "thing is wrong" cron job could however *retrieve* the > information from the manager (over public https) and check it. Tbh I think this is overengineering. Why not just add separate cron jobs for all of the places? -- Sean Whitton

