On Tue, Sep 29, 2015 at 10:50 PM, <[email protected]> wrote: > Hi- > > I have been conducting scaling tests with OVS and docker. My tests revealed > that the latency of ARP packets can become very large resulting in many ARP > re-transmissions and time-outs. I found the source of the poor latency to be > with the handling of arp packets in ovs_vport_find_upcall_portid(). Each > packet is hashed in ovs_vport_find_upcall_portid() by calling > skb_get_hash(). This hash is used to select a netlink socket in which to > send the packet to userspace. However, skb_get_hash() is not supporting ARP > packets returning a 0 (invalid hash) for every ARP. This results in a > single ovs-vswitchd handler thread processing every arp packet thus severely > impacting the average latency of ARPs. I am purposing a change to > ovs_vport_find_upcall_portid() that spreads the ARP packets evenly between > all the handler threads (patch to follow). Please let me know if you have > suggestions/comments.
This is definitely an interesting analysis but I'm a little surprised at the basic scenario. First, I guess it seems to me that the L2 domain is too large if there are this many ARPs. The speed also generally seems slower than I would expect but in any case I don't disagree that it is better to spread the load among all the cores. On the patch itself, can't we just make skb_get_hash() be able to decode ARP? It seems like that is cleaner and more generic. _______________________________________________ discuss mailing list [email protected] http://openvswitch.org/mailman/listinfo/discuss
