Hello,

On Thu 16 Oct 2025 at 03:52pm +01, Ian Jackson wrote:

> Package: dgit-infrastructure
> Version: 13.16
> Severity: important
>
> t2u job 1459 failed Irrecoverable because the builder host rebooted
> mid-build.  I asked DSA, and:
>
> Aurelien Jarno via RT writes ("[rt.debian.org #9884] Reboot(?) of 
> tag2upload-builder-01"):
>> On all our hosts, reboot is only possible after taking the
>> /var/run/reboot-lock lock. Therefore for the critical part of the
>> tag2upload service you should take this lock. For instance to take
>> the lock for the duration of a script:
>>
>>   flock -s -n /var/run/reboot-lock your_script
>>
>> You can use the -E option to return a different error code when the
>> lock hasn't been acquired.
>>
>> Of course you should minimize the time when the lock is taken to not
>> make reboot difficult or impossible.
>
> We should implement this.  I'm not sure exactly how, though - this
> would have to happen on the builder outside the VM, and we currently
> don't run many commands there.  dgit-repos-server doesn't get a way to
> do that right now.  But maybe the oracled could do it.
>
> I guess we can think about this in the context of the retry work.

To me it makes sense for the Oracle to do this, because we'll want to
prevent reboots of the Oracle, too, so it can do both of them at the
same time.

I'll work on this.

-- 
Sean Whitton

Attachment: signature.asc
Description: PGP signature

Reply via email to