Hi Simon,

On 21-02-2022 12:10, Simon McVittie wrote:
Is there anything unusual about the ppc64el CI-runners compared with other
architectures? (For example: lots of CPUs, few CPUs, lots of RAM, less RAM,
lots of I/O bandwidth, running on tmpfs, using qemu, using lxc, running
many tests in parallel, ...)

Our ppc64el runners are quite similar in terms of CPU, RAM etc as most of our amd64/i386/arm64 workers. The thing I noticed them to be different is that they seem to run in a virtual environment:
debian@ci-worker-ppc64el-01:~$ lspci
00:01.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:02.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
00:03.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
00:04.0 Communication controller: Red Hat, Inc. Virtio console
00:05.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
00:06.0 VGA compatible controller: Device 1234:1111 (rev 02)

From https://ci.debian.net/packages/d/dbus/testing/ppc64el/ it looks like
this is failing about 25% of the time, does that match your experience?

I was totally judging form this page, so yes.

Bail out! /run/user/1000/dbus-1/services is not a directory

My best guess at the root cause for this is that when
gnome-desktop-testing-runner schedules lots of unit tests in a
newly-opened user session, if the integration test for transient
services happens to be one of the first ones to be run, then the session
dbus-daemon will not necessarily have been started by systemd socket
activation just yet. If the test runner has a large number of CPU cores,
then that makes it more likely that the test will win the race with the
dbus-daemon, resulting in failure.

I don't experience our ppc64el hosts as extremely fast, but who knows.

I have a possible patch which I'll upload soon. Would you be able to
schedule several consecutive runs on the affected hardware to make
sure it's really fixed? 10 runs should be enough for a reasonable level
of confidence.

Sure, but anybody (with Salsa credentials) can schedule those jobs. Just hitting the retry button will do. Results should be fast too as they are scheduled with higher prio.

Paul

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to