Re: [PATCH bpf-next v2] bpf: using rcu_read_lock for bpf_sk_storage_map iterator

Martin KaFai Lau Tue, 15 Sep 2020 22:38:45 -0700

On Tue, Sep 15, 2020 at 03:35:07PM -0700, Yonghong Song wrote:
> If a bucket contains a lot of sockets, during bpf_iter traversing
> a bucket, concurrent userspace bpf_map_update_elem() and
> bpf program bpf_sk_storage_{get,delete}() may experience
> some undesirable delays as they will compete with bpf_iter
> for bucket lock.
> 
> Note that the number of buckets for bpf_sk_storage_map
> is roughly the same as the number of cpus. So if there
> are lots of sockets in the system, each bucket could
> contain lots of sockets.
> 
> Different actual use cases may experience different delays.
> Here, using selftest bpf_iter subtest bpf_sk_storage_map,
> I hacked the kernel with ktime_get_mono_fast_ns()
> to collect the time when a bucket was locked
> during bpf_iter prog traversing that bucket. This way,
> the maximum incurred delay was measured w.r.t. the
> number of elements in a bucket.
>     # elems in each bucket          delay(ns)
>       64                            17000
>       256                           72512
>       2048                          875246
> 
> The potential delays will be further increased if
> we have even more elemnts in a bucket. Using rcu_read_lock()
> is a reasonable compromise here. It may lose some precision, e.g.,
> access stale sockets, but it will not hurt performance of
> bpf program or user space application which also tries
> to get/delete or update map elements.
> 
> Cc: Martin KaFai Lau <[email protected]>
> Acked-by: Song Liu <[email protected]>
> Signed-off-by: Yonghong Song <[email protected]>
> ---
>  net/core/bpf_sk_storage.c | 21 ++++++++-------------
>  1 file changed, 8 insertions(+), 13 deletions(-)
> 
> Changelog:
>  v1 -> v2:
>    - added some performance number. (Song)
>    - tried to silence some sparse complains. but still has some left like
>        context imbalance in "..." - different lock contexts for basic block
>      which the code is too hard for sparse to analyze. (Jakub)
> 
> diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> index 4a86ea34f29e..4fc6b03d3639 100644
> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c
> @@ -678,6 +678,7 @@ struct bpf_iter_seq_sk_storage_map_info {
>  static struct bpf_local_storage_elem *
>  bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info 
> *info,
>                                struct bpf_local_storage_elem *prev_selem)
> +     __acquires(RCU) __releases(RCU)
>  {
>       struct bpf_local_storage *sk_storage;
>       struct bpf_local_storage_elem *selem;
> @@ -701,7 +702,7 @@ bpf_sk_storage_map_seq_find_next(struct 
> bpf_iter_seq_sk_storage_map_info *info,
>               if (!selem) {
>                       /* not found, unlock and go to the next bucket */
>                       b = &smap->buckets[bucket_id++];
> -                     raw_spin_unlock_bh(&b->lock);
> +                     rcu_read_unlock();
>                       skip_elems = 0;
>                       break;
>               }
> @@ -715,7 +716,7 @@ bpf_sk_storage_map_seq_find_next(struct 
> bpf_iter_seq_sk_storage_map_info *info,
>  
>       for (i = bucket_id; i < (1U << smap->bucket_log); i++) {
>               b = &smap->buckets[i];
> -             raw_spin_lock_bh(&b->lock);
> +             rcu_read_lock();
>               count = 0;
>               hlist_for_each_entry(selem, &b->list, map_node) {
hlist_for_each_entry_rcu()


>                       sk_storage = rcu_dereference_raw(selem->local_storage);
Does lockdep complain without "_raw"?

Re: [PATCH bpf-next v2] bpf: using rcu_read_lock for bpf_sk_storage_map iterator

Reply via email to