On Tue, Dec 11, 2018 at 12:14:12PM +0100, Daniel Borkmann wrote: > Michael and Sandipan report: > > Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF > JIT allocations. At compile time it defaults to PAGE_SIZE * 40000, > and is adjusted again at init time if MODULES_VADDR is defined. > > For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with > the compile-time default at boot-time, which is 0x9c400000 when > using 64K page size. This overflows the signed 32-bit bpf_jit_limit > value: > > root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit > -1673527296 > > and can cause various unexpected failures throughout the network > stack. In one case `strace dhclient eth0` reported: > > setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, > 16) = -1 ENOTSUPP (Unknown error 524) > > and similar failures can be seen with tools like tcpdump. This doesn't > always reproduce however, and I'm not sure why. The more consistent > failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 > host would time out on systemd/netplan configuring a virtio-net NIC > with no noticeable errors in the logs. > > Given this and also given that in near future some architectures like > arm64 will have a custom area for BPF JIT image allocations we should > get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For > 4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec() > so therefore add another overridable bpf_jit_alloc_exec_limit() helper > function which returns the possible size of the memory area for deriving > the default heuristic in bpf_jit_charge_init(). > > Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new > bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default > JIT memory provider, and therefore in case archs implement their custom > module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for > vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}. > > Additionally, for archs supporting large page sizes, we should change > the sysctl to be handled as long to not run into sysctl restrictions > in future. > > Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv > allocations") > Reported-by: Sandipan Das <sandi...@linux.ibm.com> > Reported-by: Michael Roth <mdr...@linux.vnet.ibm.com> > Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
Applied, Thanks