On Fri, Dec 15, 2023 at 11:37:33AM -0800, Erich Eickmeyer wrote:
> Additionally, the SRU team, Release Team, and Archive Admin team have not
> done any work on what it means to onboard any team members, which is in
> itself a breach of the Code of Conduct:

Erich, you have a pattern of invoking the Code of Conduct against fellow
developers when they disagree with you which is inappropriately escalatory
and does not advance your purpose.  Please stop.

The teams in question have been asked to document their onboarding
requirements and process.  This is a fair ask, which has not yet been
delivered on publicly for any of the teams in question because it must be
balanced against day-to-day responsibilities.

But you are implying with your message that the lack of DOCUMENTATION for
onboarding onto these teams is the cause of problems with the response to
the recent high-impact SRU regression.

There are many things I think can be improved about how this SRU regression
was handled, and I will go into details below.  But it is unrealistic to
argue that having this documentation in place would have changed the
composition of the teams at the time and thereby prevented this incident. 
In particular, the perceived problem at the time was lack of availability of
an Archive Admin, and a defining principle of membership in the AA team is
this: there are many competent and trustworthy Ubuntu developers who could
do the job of archive admins; but because of the raw control over the
archive that membership in this team confers, the team should be as large as
it needs to be to fulfill its responsibility to the community of Ubuntu
developers, *and no larger*.

So no, writing this down on a wiki page would not have changed the
composition of the Archive Team prior to this event; nor does the fact that
this event happened imply that expanding the archive team is the correct
remedy.


A timeline of events; all times given in US/Pacific to minimize the
possibility of miscalculations on my side.

2023-12-07 13:25: mutter 45.2-0ubuntu1 SRU accepted into mantic-proposed.

2023-12-13 08:03: bug #2046360 opened, reporting a regression in this SRU.
                  uploader of SRU subscribed to bug and bug was tagged
                  regression-proposed.

2023-12-14 04:52: mutter 45.2-0ubuntu1 SRU released into mantic-updates.

2023-12-15 01:22: bug #2046360 re-tagged regression-update.

2023-12-15 05:36: ahasenack asks on a Canonical-internal SRU team chat about
                  stopping phasing for an update.

2023-12-15 07:40: ahasenack pings ubuntu-archive on #ubuntu-release.

2023-12-15 08:54: tsimonq2 responds to the pings on #ubuntu-release.

2023-12-15 09:33: ahasenack asks on a Canonical internal chat for an archive
                  admin but does not highlight AAs by name.

2023-12-15 10:04: aaronprisk (Community Team at Canonical) reaches
                  out to me directly on Canonical internal chat, indicating
                  he had been contacted by tsimonq2.  I do not know if he
                  reached out to other AAs.

2023-12-15 11:09: I notice Aaron's message and indicate I will address this
                  with an ETA of an hour (I am out of the office at the time)

2023-12-15 11:37: preceding message is sent to tech board mailing list.

2023-12-15 12:26: I make it to my computer where I'm able to effect the
                  requested change to SRU phasing.

2023-12-17 19:44: I upload a revert of mutter to mantic-proposed.

2023-12-17 21:41: the revert of mutter is accepted to mantic-proposed by
                  another SRU team member.


So there are a number of things that didn't work well here in terms of
process.

- The regression in the SRU was reported by Dan and an appropriate tag was
  set.  However, he did not mark the corresponding SRU bug
  verification-failed, which is part of the process for regression handling
  documented on <https://wiki.ubuntu.com/StableReleaseUpdates#Verification>. 
  So a longstanding member of the Ubuntu Desktop team (but not an Ubuntu
  developer?) was unfamiliar with the necessary process for blocking an SRU
  when a regression is detected.  Do we have gaps in how the existing
  process has been communicated?

- I subscribe to the regression-update and regression-proposed bug tags, but
  we have not set an expectation that all members of the SRU team subscribe
  to these tags.  Comparing the "May be notified" lists on the side bar of
  sample bugs suggested in fact that I was the only member of the SRU team
  subscribed to the regression-proposed tag at the time; and only about half
  of the SRU team members appear to be subscribed to the regression-update
  tag.  Should we require SRU team members to be subscribed to both tags, as
  an additional guard against accidental mis-release of regressions?

- Even if everyone was subscribed to the regression-proposed tag, there's no
  guarantee they've received/seen/read the email before processing the list
  of to-be-released packages on
  <https://ubuntu-archive-team.ubuntu.com/pending-sru.html>; and even if
  they have, they may overlook the connection between such a mail and an
  SRU they are about to release.  Should this report flag all
  regression-proposed bugs open against a package, regardless of series
  targeting?

- Was there agreement about the urgency of the need to disable phasing, and
  was this urgency communicated?  Dan did not set a severity on the bug when
  filing it.  Exchange between ahasenack and jbicha (uploader) on IRC
  yielded a "yes, let's pause phasing", with no apparent expression of
  urgency.  There was no effort, visible to me, to escalate to any archive
  admins individually using available communications channels until
  aaronprisk pinged me, over 4 hours after the initial IRC pings.  (At the
  time the decision was initially made to request a stop for phasing, it was
  still well within European business hours, for any EU-based archive admins
  who were not already out for the end of the year.)

- We have a standing policy of not releasing SRUs on Friday, unless there's
  an exceptional reason to do so and a member of the SRU team commits to
  being available on the weekend to handle any regressions.  This SRU was
  not released on a Friday, it was released on a Thursday; but it was the
  Thursday before a company-wide end-of-year shutdown and many folks were
  already out on vacation (including myself).  Should we have been releasing
  SRUs this day without verifying there was appropriate capacity for dealing
  with any regressions?  Should there have been an explicit conversation
  about end-of-year plans for SRU releases among the SRU team?  I understand
  there was a specific request to release this SRU before the end of year,
  but it's not clear that this request should have been honored under the
  circumstances.

- The normal process for handling a regression in an SRU is to set phasing
  to zero, to minimize the propagation of the bad update to additional
  users; AND to immediately begin the process of doing a follow-up SRU to
  revert the bad changes so that any users who have already received the bad
  update before changes to the phasing are able to get a fix.  The first
  part of this blocks on availability of an Archive Admin.  The second part
  of this is entirely within the power of the uploader together with a
  member of the SRU team.  But a full day later, there had still not yet
  been an upload of mutter to mantic-proposed to fix this problem for
  affected users.  Why is that?  Comments from the uploader on IRC:

  12:28 <jbicha> vorlon: unfortunately I'm out of time today to do a new
                 mutter upload.  Would we want the new targeted mutter fix
                 to wait for 7 days too?

  12:29 <jbicha> mutter 45.2 fixes important enough issues that I'd rather
                 go forward than backwards

  But "rather go forward than backwards" has resulted in neither happening
  for over a day.  And "out of time today" came 5 hours after the decision
  that phasing should be halted.

  Robie commented on IRC that there should be a clearer playbook for
  handling regressions.  Absent that, however, it should still be clear that
  turning off phasing for an SRU only prevents it from being delivered to
  MORE users, it does not un-break users who have already received a broken
  SRU.

In summary: no Ubuntu core-dev involved in this SRU thought the severity of
the bug was high enough to warrant doing the work of uploading a revert for
over 48 hours after it was known to be a regression in mantic-updates; yet
you are accusing the Archive Team of mismanagement because of a 4-hour delay
in response to a non-urgent request for dealing with the same bug.

-- 
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                   https://www.debian.org/
slanga...@ubuntu.com                                     vor...@debian.org

Attachment: signature.asc
Description: PGP signature

-- 
Ubuntu-release mailing list
Ubuntu-release@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-release

Reply via email to