On Sun, Oct 16, 2016 at 09:41:28AM -0700, William Tu wrote:
> When running bpf_map_lookup on percpu elements, the bytes copied to
> userspace depends on num_possible_cpus() * value_size, which could
> potentially be larger than memory allocated from user, which depends
> on sysconf(_SC_NPROCESSORS_CONF) to get the current cpu num.  As a
> result, the inconsistency might corrupt the user's stack.
> 
> The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu()
> happens when cpu hotadd is enabled.  For example, in Fusion when
> setting vcpu.hotadd = "TRUE" or in KVM, setting
>   ./qemu-system-x86_64 -smp 2, maxcpus=4 ...
> the num_possible_cpu() will be 4 and sysconf() will be 2[1].
> Currently the any percpu map lookup suffers this issue, try
> samples/bpf/test_maps.c or tracex3.c.
> 
> Th RFC patch adds additional 'size' param from userspace so that
> kernel knows the maximum memory it should copy to the user.
> 
> [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html
> 
> Signed-off-by: William Tu <u9012...@gmail.com>
> ---
>  include/uapi/linux/bpf.h       |  5 ++++-
>  kernel/bpf/syscall.c           |  5 +++--
>  samples/bpf/fds_example.c      |  2 +-
>  samples/bpf/libbpf.c           |  3 ++-
>  samples/bpf/libbpf.h           |  2 +-
>  samples/bpf/test_maps.c        | 30 +++++++++++++++---------------
>  tools/include/uapi/linux/bpf.h |  5 ++++-
>  7 files changed, 30 insertions(+), 22 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index f09c70b..fa0c40b 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -123,7 +123,10 @@ union bpf_attr {
>                       __aligned_u64 value;
>                       __aligned_u64 next_key;
>               };
> -             __u64           flags;
> +             union {
> +                     __u64           flags;
> +                     __u32           size; /* number of bytes allocated in 
> userspace */
> +             };
...
> -     if (copy_to_user(uvalue, value, value_size) != 0)
> +     if (copy_to_user(uvalue, value, min_t(u32, usize, value_size)) != 0)
>               goto free_value;

I think such approach won't actually fix anything. User space
may lose some of the values and won't have any idea what was lost.
I think we need to fix sample code to avoid using sysconf(_SC_NPROCESSORS_CONF)
and use /sys/devices/system/cpu/possible instead.
I would argue that glibc should be fixed as well since relying on
ls -d /sys/devices/system/cpu/cpu[0-9]*|wc -l turned out to be incorrect.

Reply via email to