Hi,
I'm having an intermittent issue with "BusyBox v1.36.0 (2023-01-03 22:49:12
UTC)" (the one from the Docker image busybox:musl) when running on amd64 GitHub
actions runner VMs (azure).
When I use sha256sum it is getting terminated with SIGILL, Illegal instruction.
The issue is hard to reproduce but I have a GitHub actions CI/CD job that I can
re-run repeatedly (no changes to code, environment, data input, etc) that will
occasionally have the issue. I managed to capture a core dump.
/ # file core-sha256sum.10.1677605036
core-sha256sum.10.1677605036: ELF 64-bit LSB core file, x86-64, version 1
(SYSV), SVR4-style, from 'sha256sum -w -s -c -', real uid: 0, effective uid: 0,
real gid: 0, effective gid: 0, execfn: '/bin/sha256sum', platform: 'x86_64'
# gdb /bin/sha256sum core-sha256sum.10.1677605036
Reading symbols from /bin/sha256sum...
(No debugging symbols found in /bin/sha256sum)
[New LWP 10]
Core was generated by `sha256sum -w -s -c -'.
Program terminated with signal SIGILL, Illegal instruction.
#0 0x0000000000401161 in ?? ()
If I use "layout asm" it shows this:
> 0×401161 or (%rax),%al
Here's the result of an strace:
2023-02-28T21:24:48.2816253Z execve("/usr/bin/sha256sum", ["sha256sum", "-w",
"-s", "-c", "-"], 0x7ffe64982460 /* 5 vars */) = 0
2023-02-28T21:24:48.2816600Z arch_prctl(ARCH_SET_FS, 0x51c258) = 0
2023-02-28T21:24:48.2816865Z set_tid_address(0x51cbd8) = 15
2023-02-28T21:24:48.2818368Z getuid() = 0
2023-02-28T21:24:48.2818764Z brk(NULL) = 0x1803000
2023-02-28T21:24:48.2819202Z brk(0x1805000) = 0x1805000
2023-02-28T21:24:48.2821626Z mmap(0x1803000, 4096, PROT_NONE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1803000
2023-02-28T21:24:48.2822306Z mmap(NULL, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f54496bc000
2023-02-28T21:24:48.2822893Z read(0, "e4d5808efbd4239a2f496b6055ac15b2"...,
1024) = 154
2023-02-28T21:24:48.2823486Z mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f54496bb000
2023-02-28T21:24:48.2823993Z open("1.544MiB.bin", O_RDONLY|O_LARGEFILE) = 3
2023-02-28T21:24:48.2824623Z read(3,
"\270\222\2W\262\203&^\1\304X\372\247H\6\261\212\220\303i0D\266tx\356\353\370\327\363\354q"...,
4096) = 4096
2023-02-28T21:24:48.2825392Z --- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN,
si_addr=0x401161} ---
2023-02-28T21:24:48.2826349Z +++ killed by SIGILL (core dumped) +++
2023-02-28T21:24:48.2846595Z Illegal instruction
Here's a later one with a simpler set of arguments:
2023-02-28T21:47:29.2574112Z + strace sha256sum 1.544MiB.bin 1MiB.bin
2023-02-28T21:47:29.2611500Z execve("/usr/bin/sha256sum", ["sha256sum",
"1.544MiB.bin", "1MiB.bin"], 0x7ffc41e4baf0 /* 5 vars */) = 0
2023-02-28T21:47:29.2616858Z arch_prctl(ARCH_SET_FS, 0x51c258) = 0
2023-02-28T21:47:29.2617173Z set_tid_address(0x51cbd8) = 10
2023-02-28T21:47:29.2617428Z getuid() = 0
2023-02-28T21:47:29.2617682Z brk(NULL) = 0x1c4d000
2023-02-28T21:47:29.2623375Z brk(0x1c4f000) = 0x1c4f000
2023-02-28T21:47:29.2623919Z mmap(0x1c4d000, 4096, PROT_NONE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1c4d000
2023-02-28T21:47:29.2625076Z mmap(NULL, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2a9de51000
2023-02-28T21:47:29.2625396Z open("1.544MiB.bin", O_RDONLY|O_LARGEFILE) = 3
2023-02-28T21:47:29.2625758Z read(3,
"\270\222\2W\262\203&^\1\304X\372\247H\6\261\212\220\303i0D\266tx\356\353\370\327\363\354q"...,
4096) = 4096
2023-02-28T21:47:29.2626698Z --- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN,
si_addr=0x401161} ---
2023-02-28T21:47:29.2632505Z +++ killed by SIGILL (core dumped) +++
2023-02-28T21:47:29.2632762Z Illegal instruction
At the following URL the first run didn't fail but the second one did (I just
hit rerun all jobs without changing anything else):
https://github.com/backplane/avxtest/actions/runs/4297660126
(The name of that repo is based on my initial theory of the problem, but I
don't see evidence that avx is involved.)
You can bring up a shell with the core dump (and binary) in question using:
docker run --rm -it --platform linux/amd64 ghcr.io/backplane/avxtest:1677620842
<http://ghcr.io/backplane/avxtest:1677620842>
I use alpine to get strace and earlier gdb, file, etc. but I'm working with the
binary from the busybox:musl container.
Any idea what's going on?
Thanks,
Ben
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox