On Sun, Oct 16, 2016 at 09:41:28AM -0700, William Tu wrote: > When running bpf_map_lookup on percpu elements, the bytes copied to > userspace depends on num_possible_cpus() * value_size, which could > potentially be larger than memory allocated from user, which depends > on sysconf(_SC_NPROCESSORS_CONF) to get the current cpu num. As a > result, the inconsistency might corrupt the user's stack. > > The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu() > happens when cpu hotadd is enabled. For example, in Fusion when > setting vcpu.hotadd = "TRUE" or in KVM, setting > ./qemu-system-x86_64 -smp 2, maxcpus=4 ... > the num_possible_cpu() will be 4 and sysconf() will be 2[1]. > Currently the any percpu map lookup suffers this issue, try > samples/bpf/test_maps.c or tracex3.c. > > Th RFC patch adds additional 'size' param from userspace so that > kernel knows the maximum memory it should copy to the user. > > [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html > > Signed-off-by: William Tu <u9012...@gmail.com> > --- > include/uapi/linux/bpf.h | 5 ++++- > kernel/bpf/syscall.c | 5 +++-- > samples/bpf/fds_example.c | 2 +- > samples/bpf/libbpf.c | 3 ++- > samples/bpf/libbpf.h | 2 +- > samples/bpf/test_maps.c | 30 +++++++++++++++--------------- > tools/include/uapi/linux/bpf.h | 5 ++++- > 7 files changed, 30 insertions(+), 22 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index f09c70b..fa0c40b 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -123,7 +123,10 @@ union bpf_attr { > __aligned_u64 value; > __aligned_u64 next_key; > }; > - __u64 flags; > + union { > + __u64 flags; > + __u32 size; /* number of bytes allocated in > userspace */ > + }; ... > - if (copy_to_user(uvalue, value, value_size) != 0) > + if (copy_to_user(uvalue, value, min_t(u32, usize, value_size)) != 0) > goto free_value;
I think such approach won't actually fix anything. User space may lose some of the values and won't have any idea what was lost. I think we need to fix sample code to avoid using sysconf(_SC_NPROCESSORS_CONF) and use /sys/devices/system/cpu/possible instead. I would argue that glibc should be fixed as well since relying on ls -d /sys/devices/system/cpu/cpu[0-9]*|wc -l turned out to be incorrect.