On Wed, 2017-02-22 at 11:38 -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn <will...@google.com>
> 
> Refine skb_copy_ubufs to support compount pages. With upcoming TCP
> and UDP zerocopy sendmsg, such fragments may appear.
> 
> These skbuffs can have both kernel and zerocopy fragments, e.g., when
> corking. Avoid unnecessary copying of fragments that have no userspace
> reference.
> 
> It is not safe to modify skb frags when the skbuff is shared. This
> should not happen. Fail loudly if we find an unexpected edge case.
> 
> Signed-off-by: Willem de Bruijn <will...@google.com>
> ---
>  net/core/skbuff.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index f3557958e9bf..67e4216fca01 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -944,6 +944,9 @@ EXPORT_SYMBOL_GPL(skb_morph);
>   *   If this function is called from an interrupt gfp_mask() must be
>   *   %GFP_ATOMIC.
>   *
> + *   skb_shinfo(skb) can only be safely modified when not accessed
> + *   concurrently. Fail if the skb is shared or cloned.
> + *
>   *   Returns 0 on success or a negative error code on failure
>   *   to allocate kernel memory to copy to.
>   */
> @@ -954,11 +957,29 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
>       struct page *page, *head = NULL;
>       struct ubuf_info *uarg = skb_shinfo(skb)->destructor_arg;
>  
> +     if (skb_shared(skb) || skb_cloned(skb)) {
> +             WARN_ON_ONCE(1);
> +             return -EINVAL;
> +     }
> +
>       for (i = 0; i < num_frags; i++) {
>               u8 *vaddr;
> +             unsigned int order = 0;
> +             gfp_t mask = gfp_mask;
>               skb_frag_t *f = &skb_shinfo(skb)->frags[i];
>  
> -             page = alloc_page(gfp_mask);
> +             page = skb_frag_page(f);
> +             if (page_count(page) == 1) {
> +                     skb_frag_ref(skb, i);

This could be : get_page(page);

> +                     goto copy_done;
> +             }
> +
> +             if (f->size > PAGE_SIZE) {
> +                     order = get_order(f->size);
> +                     mask |= __GFP_COMP;

Note that this would probably fail under memory pressure.

We could instead try to explode the few segments into order-0 only
pages.

Hopefully this case should not be frequent.


Reply via email to