Hi, there. We have been investigating an issue we have observed on POWER8 POWERNV systems. When running the kernel selftests reuseport_bpf_cpu after a CPU hotplug, we see crashes, in different forms. [1]
I managed to get xmon on that trap, and did some debugging. [2] I tried to dump the BPF JIT code, and it looks different when dumped from CPU#0 and CPU#0x9f (the one that was hotplugged, offlined, then onlined). Here is my partial analysis [3]. Basically, the BPF JIT fills a page with invalid instructions (traps, in ppc64 case), and puts the BPF program in a random offset of the page. In the case of the hotplugged CPU, which was the one that compiled the program, the page had the expected contents (BPF program started at the offset used to run the program). On the other CPU (in many cases, CPU #0), the same memory address/page had different contents, with the program starting at a different offset. Is this a case of a bug in the micro-architecture or the firmware when doing the hotplug? Can someone chime in? Notice that we can't reproduce the same issue on a POWER9 system. Thanks. Cascardo. [1] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076 [2] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076/comments/29 [3] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076/comments/30