reboot-lock ineffective on tag2upload-builder-01

Adam D. Barratt via RT Sat, 17 Jan 2026 07:29:16 -0800

On Sat Jan 17 15:19:42 2026, [email protected] wrote:
> Hello Adam,
> 
> Thank you for a very helpful message.  I'm quoting more than I'm
> replying to in order to copy your message to the BTS.


Unfortunately it may not actually have been that helpful. See my follow-up 
responses for corrections. :-(

> Adam D. Barratt [17/Jan  2:46pm GMT] wrote:
> > On Sat Jan 17 14:02:53 2026, [email protected] wrote:
> >> Hello,
> >>
> >> Aurelien Jarno [17/Jan 12:36pm GMT] wrote:
> >> > I haven't looked at all the details, but here are a few things
> >> > from
> >> > the logs.
> >> > The reboot of tag2upload-builder-01 was scheduled at 14:12:29. It
> >> > indeed caused a podman container to be stopped:
> > [...]
> >> > Could you please confirm from your logs that the reboot lock was
> >> > indeed taken by your tag2upload job?
> >>
> >> It doesn't print anything if it successfully takes the lock, but it
> >> prints something and exits if it fails to take the lock (verified by
> >> our test suite), and the logs indicate it did not exit.  So, yes, I
> >> can
> >> confirm that the job did indeed take the lock.
> >
> > Looking through the log of #1125239, I think some of the timings have
> > been confused, so it would be worth checking the process flow.
> 
> Hrm, yes.  The Podman error comes much earlier than 14:12.
> So possibly that Podman error is a completely unrelated bug in Podman.
> It may have been introduced by the upgrade to trixie.
> Ian, what are you thoughts on this?

For clarity, the builder was rebooted at 14:12, but the oracle rebooted at 
13:54 and the manager at 13:58.
 
> > | Jan 10 13:53:44 tag2upload-oracle-01 tag2upload-oracled[2556368]:
> > | [t2u-oracled tag2upload-builder-01.debian.org,2556368][2026-01-
> > 10T13:53:44]
> > | group_leader: received SIGTERM; shutting down workers
> > | Jan 10 13:53:44 tag2upload-oracle-01 systemd[2556306]: Stopping
> > tag2upload-oracled.service - tag2upload Oracle daemon...
> > | Jan 10 13:53:44 tag2upload-oracle-01 systemd[2556306]: Stopped
> > tag2upload-oracled.service - tag2upload Oracle daemon.
> > | -- Boot cbbd32cac2974b5e901921187e477fa7 --
> > |
> > | This is the host rebooting.
> >
> > In fact, it's not - as Aurelien noted, the reboot was at 14:12.

(It is, just not the host I thought at first.)

[...]
> Okay, so based on this information it looks like we have an
> incompatibility between our locking arrangements, regardless of
> whether tag2upload job 2390 failed because of a reboot.
> 
> In particular, when implementing the locking I had been assuming that
> /var/run/reboot-lock would remain locked while a reboot was pending.
> But in fact it isn't.

It should be, I misread how the molly-guard script was running. So please 
ignore that section, and apologies for the confusion. :-|

Regards,

Adam

Bug#1125239: /var/run/reboot-lock ineffective on tag2upload-builder-01

Reply via email to