Package: dgit-infrastructure
Version: 14.10

See #1130639.

I think the following things are wrong:

1. Extraneous copy of the key in tag2upload-builder-01:~tag2upload-builder

The image rebuilder script copies the public key from its ~
into ~builder in the image.

AFAICT there is no other reason for ~tag2upload-builder to contain
a copy of the public key.  It would probably be better if the
image rebuilder script got the key from somewhere else, and we
deleted the copy in ~tag2upload-builder.

Otherwise, we should make a software (or process) change to ensure
that the key is in fact updated


2. We did not detect this impending breakage.

In my experience, this kind of lossage is very common in systems
involving expiry times.  It's all very well having a cron job detect
the need for renewal, and a human process that is supposed to sort the
thing out, but IME that is usually not sufficient.  Such things are
IME prone to various kinds of ailure.  A backstop is needed.

IMO we should have separate checks for each place a copy of the key
exists, that alert for any failures or omissio of the key expiry
arrangements.

Based on my experience maintaining systems where things can expire
(eg, Let's Encrypt certificates, domain names) I suggest:

* Every location which has a copy of the key, that anything relies on,
  should be checked daily by cron.  I think the locations where
  an expired key would break the system, or break downstreams, are
       - tag2upload builder VM (~builder)
       - oracle (~tag2upload-oracle)
       - Debian archive tag2upload keyring .deb
       - dak
       - dgit-repos
       - the copy on the wiki

* There should be one central cron job which is responsible for
  sending an email at least once a day if any of these are going to
  expire within the next (say) 21 days.

* The site-specific information could be collected by push or by pull.
  For example, supposing the central copy is on the manager, oracle
  and builder could ssh to the manager daily to deposit copies of
  their keys.  A cron job on the manager could wget the wiki.

* Arguably the cron job which sends the emails should *not* run on the
  manager, since the manager already has the normal cron job that is
  supposed to prompt us to do the updates.  If for some reason cron
  jobs on the manager don't run or can't email us, we'd miss the memo.

  The daily "thing is wrong" cron job could however *retrieve* the
  information from the manager (over public https) and check it.

Ian.

-- 
Ian Jackson <[email protected]>   These opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

Reply via email to