Hi Chris, On Mon, 2021-02-15 at 18:28 +0000, Chris Lamb wrote: > Ah, indeed, the failure mode means that the log never made it to > > buildd.d.o. > > Curious, not heard of that failure mode — is there someplace I can > learn about that? No worries if not.
I'm not sure if it's documented, but in this case I think enough of the system was unresponsive or killed to make the connection back to buildd.d.o fail. > > I've attached a copy of the log from zani. > > Ah, thanks. Unfortunately, it does not point us straight to the > solution. I note that you titled this bug "package OOMs" — I point > this out because the "OOM" text the log is actually the name of the > test. As in, here is tests/integration/corrupt-dump.tcl: > [...] > Do we have confirmation somewhere that the build is actually OOMing, > rather than it just timing out on a test that was designed to test > *for* an OOM condition. This OOM-related bug *should* be fixed by > virtue of them adding the test to begin with (!) but if we can show > that it is still OOMing, I suspect that upstream will be able to > address it quickly. I don't know how much context would be needed, but the machine definitely OOMed: Feb 3 20:45:22 zani/zani kernel: redis-server invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0 Feb 3 20:45:22 zani/zani kernel: redis-server cpuset=/ mems_allowed=0 Feb 3 20:45:22 zani/zani kernel: CPU: 0 PID: 45952 Comm: redis-server Not tainted 4.19.0-14-s390x #1 Debian 4.19.171-2 Feb 3 20:45:22 zani/zani kernel: Hardware name: IBM 8561 LT1 400 (z/VM 7.1.0) Feb 3 20:45:22 zani/zani kernel: Call Trace: Feb 3 20:45:22 zani/zani kernel: ([<0000000000113f2a>] show_stack+0x5a/0x78) Feb 3 20:45:22 zani/zani kernel: [<0000000000802d1a>] dump_stack+0x8a/0xb8 Feb 3 20:45:22 zani/zani kernel: [<0000000000800962>] dump_header+0x82/0x2c0 Feb 3 20:45:22 zani/zani kernel: [<00000000002b46fe>] oom_kill_process+0xde/0x380 Feb 3 20:45:22 zani/zani kernel: [<00000000002b550c>] out_of_memory+0x24c/0x3b8 Feb 3 21:07:50 zani/zani kernel: [<00000000002bd032>] __alloc_pages_nodemask+0x10b2/0x1160 Feb 3 21:07:50 zani/zani kernel: [<000000000012b0c6>] page_table_alloc+0x15e/0x2c8 Feb 3 21:07:50 zani/zani kernel: [<00000000002f8b76>] __pte_alloc+0x2e/0xf8 Feb 3 21:07:50 zani/zani kernel: [<00000000002ff258>] __handle_mm_fault+0xfc0/0x11c0 Feb 3 21:07:50 zani/zani kernel: [<00000000002ff584>] handle_mm_fault+0x12c/0x298 Feb 3 21:07:50 zani/zani kernel: [<0000000000123a12>] do_dat_exception+0x182/0x440 Feb 3 21:07:50 zani/zani kernel: [<000000000080d9d4>] pgm_check_handler+0x190/0x1e4 ... Feb 3 21:07:50 zani/zani kernel: sshd invoked oom-killer: gfp_mask=0x7080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null), order=2, oom_score_adj=-1000 Feb 3 21:07:50 zani/zani kernel: sshd cpuset=/ mems_allowed=0 Feb 3 21:07:50 zani/zani kernel: CPU: 0 PID: 1463 Comm: sshd Not tainted 4.19.0-14-s390x #1 Debian 4.19.171-2 Feb 3 21:07:50 zani/zani kernel: Hardware name: IBM 8561 LT1 400 (z/VM 7.1.0) Feb 3 21:07:50 zani/zani kernel: Call Trace: Feb 3 21:07:50 zani/zani kernel: ([<0000000000113f2a>] show_stack+0x5a/0x78) Feb 3 21:07:50 zani/zani kernel: [<0000000000802d1a>] dump_stack+0x8a/0xb8 Feb 3 21:07:50 zani/zani kernel: [<0000000000800962>] dump_header+0x82/0x2c0 Feb 3 21:07:50 zani/zani kernel: [<00000000002b46fe>] oom_kill_process+0xde/0x380 Feb 3 21:07:50 zani/zani kernel: [<00000000002b550c>] out_of_memory+0x24c/0x3b8 Feb 3 21:07:50 zani/zani kernel: [<00000000002bd032>] __alloc_pages_nodemask+0x10b2/0x1160 Feb 3 21:07:50 zani/zani kernel: [<000000000013e414>] copy_process.part.4+0x24c/0x1fb0 Feb 3 21:07:50 zani/zani kernel: [<0000000000140550>] _do_fork+0xf0/0x430 Feb 3 21:07:50 zani/zani kernel: [<00000000001409ce>] sys_clone+0x3e/0x50 Feb 3 21:07:50 zani/zani kernel: [<000000000080d630>] system_call+0xd8/0x2bc ... Feb 3 21:07:50 zani/zani kernel: oom_reaper: reaped process 45952 (redis-server), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ... Feb 3 21:07:50 zani/zani kernel: sshd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 ... Feb 3 21:07:50 zani/zani kernel: munin-node invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 ... Feb 3 21:07:50 zani/zani kernel: oom_reaper: reaped process 36654 (schroot), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ... Feb 3 21:07:50 zani/zani kernel: oom_reaper: reaped process 34994 (sbuild), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ... Feb 3 21:07:50 zani/zani kernel: oom_reaper: reaped process 1508 (syslog-ng), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ... Feb 3 21:07:50 zani/zani kernel: oom_reaper: reaped process 1863 (samhain), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ... Feb 3 21:07:50 zani/zani kernel: dpkg-buildpackage invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 ... Feb 3 21:07:50 zani/zani kernel: oom_reaper: reaped process 36655 (dpkg-buildpacka), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > If it helps, this test was added in this commit: > > > https://github.com/antirez/redis/commit/7ca00d694d44be13a3ff9ff1c96b49222ac9463b > > ... which was in: > > $ git tag --contains 7ca00d694d44be13a3ff9ff1c96b49222ac9463b > 6.2-rc1 > 6.2-rc2 > 6.2-rc3 > > Not sure if previous s390x builds were failing, which might be > another route to fixing this. > The most recent s390x log on https://buildd.debian.org/status/logs.php?pkg=redis&arch=s390x is for 5:6.2~rc1-3 Looking back, the 6.2~rc2-1 build ends with: *** [err]: Slave is able to detect timeout during handshake in tests/integration/replication.tcl The 6.2~rc2-2 build on zandonai ends with similar OOM logs in syslog as those from zani above, as does the -3 build. Regards, Adam