h Sent from my BlackBerry 10 smartphone. Original Message From: dev-requ...@openvswitch.org Sent: Monday, May 18, 2015 1:50 PM To: dev@openvswitch.org Reply To: dev@openvswitch.org Subject: dev Digest, Vol 70, Issue 91
Send dev mailing list submissions to dev@openvswitch.org To subscribe or unsubscribe via the World Wide Web, visit http://openvswitch.org/mailman/listinfo/dev or, via email, send a message with subject or body 'help' to dev-requ...@openvswitch.org You can reach the person managing the list at dev-ow...@openvswitch.org When replying, please edit your Subject line so it is more specific than "Re: Contents of dev digest..." Today's Topics: 1. Re: [PATCH 1/7] dp-packet: Remove 'list' member. (Daniele Di Proietto) 2. Re: [PATCH 5/7] dpif-netdev: Use memcpy() to initialize pkt_metadata. (Daniele Di Proietto) 3. [PATCH v2 0/7] Userspace datapath performance improvements (Daniele Di Proietto) 4. [PATCH v2 2/7] dp-packet: Remove 'frame' member. (Daniele Di Proietto) ---------------------------------------------------------------------- Message: 1 Date: Mon, 18 May 2015 17:39:17 +0000 From: Daniele Di Proietto <diproiet...@vmware.com> To: Pravin Shelar <pshe...@nicira.com> Cc: "dev@openvswitch.org" <dev@openvswitch.org> Subject: Re: [ovs-dev] [PATCH 1/7] dp-packet: Remove 'list' member. Message-ID: <d17fe238.1441%diproiet...@vmware.com> Content-Type: text/plain; charset="us-ascii" On 15/05/2015 20:50, "Pravin Shelar" <pshe...@nicira.com> wrote: >On Thu, Apr 23, 2015 at 11:39 AM, Daniele Di Proietto ><diproiet...@vmware.com> wrote: >> The 'list' member is only used (two users) in the slow path. >> This commit removes it to reduce the struct size >> >> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com> >> --- >> lib/dp-packet.c | 13 ------------- >> lib/dp-packet.h | 8 -------- >> lib/netdev-dummy.c | 50 >>++++++++++++++++++++++++++++++++++++++++++-------- >> ofproto/ofproto-dpif.c | 30 ++++++++++++++---------------- >> 4 files changed, 56 insertions(+), 45 deletions(-) >> >... >... >> } else if (retval != -EAGAIN) { >> @@ -261,9 +273,15 @@ dummy_packet_stream_run(struct netdev_dummy *dev, >>struct dummy_packet_stream *s) >> static void >> dummy_packet_stream_close(struct dummy_packet_stream *s) >> { >> + struct pkt_list_node *pkt; >> + >> stream_close(s->stream); >> dp_packet_uninit(&s->rxbuf); >> - dp_packet_list_delete(&s->txq); >> + >> + LIST_FOR_EACH_POP(pkt, list_node, &s->txq) { >> + dp_packet_delete(pkt->pkt); >> + free(pkt); >> + } >I see this code duplicated in this file. Can you add function to >delete packets from this list so that is can be used in other >functions. That's a good idea, will include in v2 ------------------------------ Message: 2 Date: Mon, 18 May 2015 17:41:19 +0000 From: Daniele Di Proietto <diproiet...@vmware.com> To: Pravin Shelar <pshe...@nicira.com> Cc: "dev@openvswitch.org" <dev@openvswitch.org> Subject: Re: [ovs-dev] [PATCH 5/7] dpif-netdev: Use memcpy() to initialize pkt_metadata. Message-ID: <d17fe25f.1443%diproiet...@vmware.com> Content-Type: text/plain; charset="us-ascii" On 15/05/2015 20:50, "Pravin Shelar" <pshe...@nicira.com> wrote: >On Thu, Apr 23, 2015 at 11:40 AM, Daniele Di Proietto ><diproiet...@vmware.com> wrote: >> Initializing the dp_packet's metadata can be a hot spot, especially >> for very simple pipelines. Therefore improving the code here can >> sometimes make a difference. >> >> Using memcpy instead of a plain assignment helps GCC and clang generate >> faster code. Here's a comparison of the compiler generated code (GCC >>4.8) >> with or without this commit. >> >> BEFORE (assignment) | AFTER(memcpy) >> >> c8: add $0x8,%r8 | d8: mov (%rsi),%r8 >> mov (%rcx),%r9 | mov (%rdx),%rdi >> mov (%rbx),%r11d | add $0x1,%ecx >> mov %r10,%rcx | add $0x8,%rsi >> cmp %rsi,%r8 | cmp -0x870(%rbp),%ecx >> lea 0x88(%r9),%rdi | mov %rdi,0x88(%r8) >> rep stos %rax,%es:(%rdi) | mov 0x8(%rdx),%rdi >> mov %r11d,0xb8(%r9) | lea 0x88(%r8),%rax >> mov %r8,%rcx | mov %rdi,0x90(%r8) >> jne c8 | mov 0x10(%rdx),%rdi >> | mov %rdi,0x98(%r8) >> | mov 0x18(%rdx),%rdi >> | mov %rdi,0xa0(%r8) >> | mov 0x20(%rdx),%r8 >> | mov %r8,0x20(%rax) >> | mov 0x28(%rdx),%r8 >> | mov %r8,0x28(%rax) >> | mov 0x30(%rdx),%r8 >> | mov %r8,0x30(%rax) >> | jl d8 >> >> The old code uses a 'rep stos' and fetches the 'port_no' value from >> the 'port' member at every iteration ('mov (%rbx),%r11d'), while the >> new code uses a series of mov operation to accomplish everything. >> >> I can measure a through improvement of ~7% on a single flow phy-phy test >> with 64 bytes UDP packets. >> >> The improvement has been observed on an Intel Xeon Sandy Bridge (2012) >> and on an Intel Xeon Westmere (2010). >> >> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com> >> --- >> lib/dpif-netdev.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index f1d65f5..7d55997 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -2507,13 +2507,16 @@ dp_netdev_process_rxq_port(struct >>dp_netdev_pmd_thread *pmd, >> error = netdev_rxq_recv(rxq, packets, &cnt); >> cycles_count_end(pmd, PMD_CYCLES_POLLING); >> if (!error) { >> + const struct pkt_metadata md = >>PKT_METADATA_INITIALIZER(port->port_no); >This change looks good. But I think we can improve it even more by >replacing port->port_no with pkt_metadata. So that we do not need to >initialize this structure on even packet receive. You're right, it is indeed slightly faster (and an assignment is fine, we do not need an explicit memcpy). I'll replace this commit with another one. ------------------------------ Message: 3 Date: Mon, 18 May 2015 18:47:44 +0100 From: Daniele Di Proietto <diproiet...@vmware.com> To: dev@openvswitch.org Subject: [ovs-dev] [PATCH v2 0/7] Userspace datapath performance improvements Message-ID: <1431971271-9178-1-git-send-email-diproiet...@vmware.com> This series contains different tweaks to improve the performance of the userspace datapath with DPDK ports. The first commits reduce the size of struct dp_packet to three cachelines (two used by DPDK and one for our metadata). I've put in also some style fixes for lib/dp-packet.h Then, a microoptimization in the packet metadata initialization (which appears to be a bottleneck for simple workflows), toghether with the dp_packet changes, seems to improve single flow phy2phy throughput The last two commits change the way the userspace datapath handles output batches: this should give a significant improvement to multiple megaflows scenarios v1 -> v2: * Move duplicate list destruction into separate function in netdev-dummy * Store packet metadata initializer in dp_netdev_port * dp_netdev_queue_batches() is called only if the flow has been found. Daniele Di Proietto (7): dp-packet: Remove 'list' member. dp-packet: Remove 'frame' member. dp-packet: Merge 'allocated' member with DPDK mbuf 'buf_len'. dp-packet: Style fixes. dpif-netdev: Store pkt_metadata structure in dp_netdev_port. dpif-netdev: Store batch pointer in dp_netdev_flow. dpif-netdev: Share emc and fast path output batches. lib/dp-packet.c | 36 +---- lib/dp-packet.h | 325 +++++++++++++++++++++++++--------------------- lib/dpif-netdev.c | 145 ++++++++++----------- lib/flow.c | 2 +- lib/netdev-dummy.c | 48 +++++-- lib/packets.c | 6 +- lib/rstp-state-machines.c | 2 +- lib/stp.c | 2 +- ofproto/ofproto-dpif.c | 30 ++--- 9 files changed, 308 insertions(+), 288 deletions(-) -- 2.1.4 ------------------------------ Message: 4 Date: Mon, 18 May 2015 18:47:46 +0100 From: Daniele Di Proietto <diproiet...@vmware.com> To: dev@openvswitch.org Subject: [ovs-dev] [PATCH v2 2/7] dp-packet: Remove 'frame' member. Message-ID: <1431971271-9178-3-git-send-email-diproiet...@vmware.com> In 'struct ofpbuf' the 'frame' pointer was used to parse different kinds of data (Ethernet, OpenFlow, Netlink attributes). For Ethernet packets the 'frame' pointer was supposed to have the same value as the 'data' pointer. Since 'struct dp_packet' is only used for Ethernet packets, there's no need for a separate 'frame' pointer: we can use the 'data' pointer instead. Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com> --- lib/dp-packet.c | 17 ++---------- lib/dp-packet.h | 71 +++++++++++++++++++---------------------------- lib/flow.c | 2 +- lib/packets.c | 6 ++-- lib/rstp-state-machines.c | 2 +- lib/stp.c | 2 +- 6 files changed, 37 insertions(+), 63 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index b2d9d5c..375b7b7 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -27,7 +27,6 @@ dp_packet_init__(struct dp_packet *b, size_t allocated, enum dp_packet_source so { b->allocated = allocated; b->source = source; - b->frame = NULL; b->l2_pad_size = 0; b->l2_5_ofs = b->l3_ofs = b->l4_ofs = UINT16_MAX; b->md = PKT_METADATA_INITIALIZER(0); @@ -164,12 +163,6 @@ dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom) new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer), dp_packet_size(buffer), headroom); - if (buffer->frame) { - uintptr_t data_delta - = (char *)dp_packet_data(new_buffer) - (char *)dp_packet_data(buffer); - - new_buffer->frame = (char *) buffer->frame + data_delta; - } new_buffer->l2_pad_size = buffer->l2_pad_size; new_buffer->l2_5_ofs = buffer->l2_5_ofs; new_buffer->l3_ofs = buffer->l3_ofs; @@ -255,11 +248,6 @@ dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom new_data = (char *) new_base + new_headroom; if (dp_packet_data(b) != new_data) { - if (b->frame) { - uintptr_t data_delta = (char *) new_data - (char *) dp_packet_data(b); - - b->frame = (char *) b->frame + data_delta; - } dp_packet_set_data(b, new_data); } } @@ -479,12 +467,11 @@ dp_packet_resize_l2_5(struct dp_packet *b, int increment) dp_packet_pull(b, -increment); } - b->frame = dp_packet_data(b); /* Adjust layer offsets after l2_5. */ dp_packet_adjust_layer_offset(&b->l3_ofs, increment); dp_packet_adjust_layer_offset(&b->l4_ofs, increment); - return b->frame; + return dp_packet_data(b); } /* Adjust the size of the l2 portion of the dp_packet, updating the l2 @@ -495,5 +482,5 @@ dp_packet_resize_l2(struct dp_packet *b, int increment) { dp_packet_resize_l2_5(b, increment); dp_packet_adjust_layer_offset(&b->l2_5_ofs, increment); - return b->frame; + return dp_packet_data(b); } diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 29a883b..54a3445 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -36,25 +36,8 @@ enum OVS_PACKED_ENUM dp_packet_source { ref to build_dp_packet() in netdev-dpdk. */ }; -/* Buffer for holding arbitrary data. An dp_packet is automatically reallocated +/* Buffer for holding packet data. A dp_packet is automatically reallocated * as necessary if it grows too large for the available memory. - * - * 'frame' and offset conventions: - * - * Network frames (aka "packets"): 'frame' MUST be set to the start of the - * packet, layer offsets MAY be set as appropriate for the packet. - * Additionally, we assume in many places that the 'frame' and 'data' are - * the same for packets. - * - * OpenFlow messages: 'frame' points to the start of the OpenFlow - * header, while 'l3_ofs' is the length of the OpenFlow header. - * When parsing, the 'data' will move past these, as data is being - * pulled from the OpenFlow message. - * - * Actions: When encoding OVS action lists, the 'frame' is used - * as a pointer to the beginning of the current action (see ofpact_put()). - * - * rconn: Reuses 'frame' as a private pointer while queuing. */ struct dp_packet { #ifdef DPDK_NETDEV @@ -67,16 +50,14 @@ struct dp_packet { #endif uint32_t allocated; /* Number of bytes allocated. */ - void *frame; /* Packet frame start, or NULL. */ enum dp_packet_source source; /* Source of memory allocated as 'base'. */ - uint8_t l2_pad_size; /* Detected l2 padding size. - * Padding is non-pullable. */ - uint16_t l2_5_ofs; /* MPLS label stack offset from 'frame', or - * UINT16_MAX */ - uint16_t l3_ofs; /* Network-level header offset from 'frame', - or UINT16_MAX. */ - uint16_t l4_ofs; /* Transport-level header offset from 'frame', - or UINT16_MAX. */ + uint8_t l2_pad_size; /* Detected l2 padding size. + * Padding is non-pullable. */ + uint16_t l2_5_ofs; /* MPLS label stack offset, or UINT16_MAX */ + uint16_t l3_ofs; /* Network-level header offset, + * or UINT16_MAX. */ + uint16_t l4_ofs; /* Transport-level header offset, + or UINT16_MAX. */ struct pkt_metadata md; }; @@ -91,7 +72,7 @@ static inline void dp_packet_set_size(struct dp_packet *, uint32_t); void * dp_packet_resize_l2(struct dp_packet *, int increment); void * dp_packet_resize_l2_5(struct dp_packet *, int increment); static inline void * dp_packet_l2(const struct dp_packet *); -static inline void dp_packet_set_frame(struct dp_packet *, void *); +static inline void dp_packet_reset_offsets(struct dp_packet *); static inline uint8_t dp_packet_l2_pad_size(const struct dp_packet *); static inline void dp_packet_set_l2_pad_size(struct dp_packet *, uint8_t); static inline void * dp_packet_l2_5(const struct dp_packet *); @@ -265,18 +246,17 @@ static inline bool dp_packet_equal(const struct dp_packet *a, const struct dp_pa memcmp(dp_packet_data(a), dp_packet_data(b), dp_packet_size(a)) == 0; } -/* Get the start if the Ethernet frame. 'l3_ofs' marks the end of the l2 +/* Get the start of the Ethernet frame. 'l3_ofs' marks the end of the l2 * headers, so return NULL if it is not set. */ static inline void * dp_packet_l2(const struct dp_packet *b) { - return (b->l3_ofs != UINT16_MAX) ? b->frame : NULL; + return (b->l3_ofs != UINT16_MAX) ? dp_packet_data(b) : NULL; } -/* Sets the packet frame start pointer and resets all layer offsets. - * l3 offset must be set before 'l2' can be retrieved. */ -static inline void dp_packet_set_frame(struct dp_packet *b, void *packet) +/* Resets all layer offsets. 'l3' offset must be set before 'l2' can be + * retrieved. */ +static inline void dp_packet_reset_offsets(struct dp_packet *b) { - b->frame = packet; b->l2_pad_size = 0; b->l2_5_ofs = UINT16_MAX; b->l3_ofs = UINT16_MAX; @@ -296,32 +276,40 @@ static inline void dp_packet_set_l2_pad_size(struct dp_packet *b, uint8_t pad_si static inline void * dp_packet_l2_5(const struct dp_packet *b) { - return b->l2_5_ofs != UINT16_MAX ? (char *)b->frame + b->l2_5_ofs : NULL; + return b->l2_5_ofs != UINT16_MAX + ? (char *) dp_packet_data(b) + b->l2_5_ofs + : NULL; } static inline void dp_packet_set_l2_5(struct dp_packet *b, void *l2_5) { - b->l2_5_ofs = l2_5 ? (char *)l2_5 - (char *)b->frame : UINT16_MAX; + b->l2_5_ofs = l2_5 + ? (char *) l2_5 - (char *) dp_packet_data(b) + : UINT16_MAX; } static inline void * dp_packet_l3(const struct dp_packet *b) { - return b->l3_ofs != UINT16_MAX ? (char *)b->frame + b->l3_ofs : NULL; + return b->l3_ofs != UINT16_MAX + ? (char *) dp_packet_data(b) + b->l3_ofs + : NULL; } static inline void dp_packet_set_l3(struct dp_packet *b, void *l3) { - b->l3_ofs = l3 ? (char *)l3 - (char *)b->frame : UINT16_MAX; + b->l3_ofs = l3 ? (char *) l3 - (char *) dp_packet_data(b) : UINT16_MAX; } static inline void * dp_packet_l4(const struct dp_packet *b) { - return b->l4_ofs != UINT16_MAX ? (char *)b->frame + b->l4_ofs : NULL; + return b->l4_ofs != UINT16_MAX + ? (char *) dp_packet_data(b) + b->l4_ofs + : NULL; } static inline void dp_packet_set_l4(struct dp_packet *b, void *l4) { - b->l4_ofs = l4 ? (char *)l4 - (char *)b->frame : UINT16_MAX; + b->l4_ofs = l4 ? (char *) l4 - (char *) dp_packet_data(b) : UINT16_MAX; } static inline size_t dp_packet_l4_size(const struct dp_packet *b) @@ -471,8 +459,7 @@ static inline void dp_packet_set_data(struct dp_packet *b, void *data) static inline void dp_packet_reset_packet(struct dp_packet *b, int off) { dp_packet_set_size(b, dp_packet_size(b) - off); - dp_packet_set_data(b, (void *) ((unsigned char *) b->frame + off)); - b->frame = NULL; + dp_packet_set_data(b, ((unsigned char *) dp_packet_data(b) + off)); b->l2_5_ofs = b->l3_ofs = b->l4_ofs = UINT16_MAX; } diff --git a/lib/flow.c b/lib/flow.c index e54280a..0f9ee50 100644 --- a/lib/flow.c +++ b/lib/flow.c @@ -449,7 +449,7 @@ miniflow_extract(struct dp_packet *packet, struct miniflow *dst) /* Initialize packet's layer pointer and offsets. */ l2 = data; - dp_packet_set_frame(packet, data); + dp_packet_reset_offsets(packet); /* Must have full Ethernet header to proceed. */ if (OVS_UNLIKELY(size < sizeof(struct eth_header))) { diff --git a/lib/packets.c b/lib/packets.c index 419c6af..016b12b 100644 --- a/lib/packets.c +++ b/lib/packets.c @@ -170,7 +170,7 @@ compose_rarp(struct dp_packet *b, const uint8_t eth_src[ETH_ADDR_LEN]) memcpy(arp->ar_tha, eth_src, ETH_ADDR_LEN); put_16aligned_be32(&arp->ar_tpa, htonl(0)); - dp_packet_set_frame(b, eth); + dp_packet_reset_offsets(b); dp_packet_set_l3(b, arp); } @@ -579,7 +579,7 @@ eth_compose(struct dp_packet *b, const uint8_t eth_dst[ETH_ADDR_LEN], memcpy(eth->eth_src, eth_src, ETH_ADDR_LEN); eth->eth_type = htons(eth_type); - dp_packet_set_frame(b, eth); + dp_packet_reset_offsets(b); dp_packet_set_l3(b, data); return data; @@ -1040,7 +1040,7 @@ compose_arp(struct dp_packet *b, const uint8_t eth_src[ETH_ADDR_LEN], put_16aligned_be32(&arp->ar_spa, ip_src); put_16aligned_be32(&arp->ar_tpa, ip_dst); - dp_packet_set_frame(b, eth); + dp_packet_reset_offsets(b); dp_packet_set_l3(b, arp); } diff --git a/lib/rstp-state-machines.c b/lib/rstp-state-machines.c index 7e23789..f55221f 100644 --- a/lib/rstp-state-machines.c +++ b/lib/rstp-state-machines.c @@ -696,7 +696,7 @@ rstp_send_bpdu(struct rstp_port *p, const void *bpdu, size_t bpdu_size) pkt = dp_packet_new(ETH_HEADER_LEN + LLC_HEADER_LEN + bpdu_size); eth = dp_packet_put_zeros(pkt, sizeof *eth); llc = dp_packet_put_zeros(pkt, sizeof *llc); - dp_packet_set_frame(pkt, eth); + dp_packet_reset_offsets(pkt); dp_packet_set_l3(pkt, dp_packet_put(pkt, bpdu, bpdu_size)); /* 802.2 header. */ diff --git a/lib/stp.c b/lib/stp.c index ec8b01a..22bd93a 100644 --- a/lib/stp.c +++ b/lib/stp.c @@ -1576,7 +1576,7 @@ stp_send_bpdu(struct stp_port *p, const void *bpdu, size_t bpdu_size) pkt = dp_packet_new(ETH_HEADER_LEN + LLC_HEADER_LEN + bpdu_size); eth = dp_packet_put_zeros(pkt, sizeof *eth); llc = dp_packet_put_zeros(pkt, sizeof *llc); - dp_packet_set_frame(pkt, eth); + dp_packet_reset_offsets(pkt); dp_packet_set_l3(pkt, dp_packet_put(pkt, bpdu, bpdu_size)); /* 802.2 header. */ -- 2.1.4 ------------------------------ Subject: Digest Footer _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev ------------------------------ End of dev Digest, Vol 70, Issue 91 *********************************** _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev