On Mon, 2017-05-29 at 17:27 +0200, Paolo Abeni wrote: > when udp_recvmsg() is executed, on x86_64 and other archs, most skb > fields are on cold cachelines. > If the skb are linear and the kernel don't need to compute the udp > csum, only a handful of skb fields are required by udp_recvmsg(). > Since we already use skb->dev_scratch to cache hot data, and > there are 32 bits unused on 64 bit archs, use such field to cache > as much data as we can, and try to prefetch on dequeue the relevant > fields that are left out. > > This can save up to 2 cache miss per packet.
okay ;) > > Signed-off-by: Paolo Abeni <pab...@redhat.com> > --- > net/ipv4/udp.c | 114 > +++++++++++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 103 insertions(+), 11 deletions(-) > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index 53fa48d..616132e 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -1163,6 +1163,83 @@ int udp_sendpage(struct sock *sk, struct page *page, > int offset, > return ret; > } > > +/* Copy as much information as possible into skb->dev_scratch to avoid > + * possibly multiple cache miss on dequeue(); > + */ > +#if BITS_PER_LONG == 64 > + > +/* we can store multiple info here: truesize, len and the bit needed to > + * compute skb_csum_unnecessary will be on cold cache lines at recvmsg > + * time. > + * skb->len can be stored on 16 bits since the udp header has been already > + * validated and pulled. > + */ > +struct udp_dev_scratch { > + __u32 truesize; > + __u16 len; > + __u16 is_linear:1; > + __u16 csum_unnecessary:1; What about u32 truesize; u16 len; bool is_linear; bool csum_unnecessary; I do not believe the __ prefix is necessary for a local structure (not uapi) Also a plain bool or u8 is faster than a bit field (shorter instructions) Thanks.