On Fri, Mar 18, 2016 at 8:50 AM, Chandran, Sugesh <sugesh.chand...@intel.com> wrote: > Hi Jesse, > Please find my answers inline. > > Regards > _Sugesh > > >> -----Original Message----- >> From: Jesse Gross [mailto:je...@kernel.org] >> Sent: Thursday, March 17, 2016 11:50 PM >> To: Chandran, Sugesh <sugesh.chand...@intel.com> >> Cc: dev@openvswitch.org >> Subject: Re: [ovs-dev] [RFC PATCH] tunneling: Improving vxlan performance >> using DPDK flow director feature. >> >> On Thu, Mar 17, 2016 at 3:43 PM, Chandran, Sugesh >> <sugesh.chand...@intel.com> wrote: >> > Hi, >> > >> > This patch proposes an approach that uses Flow director feature on the >> Intel Fortville NICs to boost the VxLAN tunneling performance. In our testing >> we verified that the VxLAN performance is almost doubled with this patch. >> > The solution programs the NIC to report the flow ID along with the VxLAN >> packets, and it is matched by OVS in software. There may be corner cases >> that needs to addressed in the approach, For eg: There is a possibility of >> race >> condition where NIC reports flow ID that may match on different flow in >> OVS. This happen when a rule is evicted by a new rule with same flowID+ >> hash in the OVS software. The packets may hit on wrong new rule in OVS >> until the flow get deleted in the hardware too. >> > >> > It is a hardware specific implementation (Only work with Intel Fortville >> NICs) for now, however the proposal works with any programmable >> NICs.This RFC proves that the OVS can offer very high speed tunneling >> performance using flow programmability in NICs. I am looking for >> comments/suggestions on adding this support(such as configuring, enable it >> for all the programmable NICs and etc) in OVS userspace datapath for >> improving the performance. >> >> This is definitely very interesting to see. Can you post some more specific >> performance numbers? > [Sugesh] > VxLAN DECAP performance(Unidirectional, Single flow, Single CPU Core) > ------------------------------------------------------------------- > PKT-IN - 9.3 Mpps > Pkt size - 114 byte VxLAN Packets(64 byte payload) > PKT-OUT - 5.6 Mpps( Without Optimization) > PKT-OUT - 9.3 Mpps(After the optimization, It hits the Input Line rate) > > VxLAN ENCAP-DECAP performance (Bidirectional, Single CPU Core) > --------------------------------------------------------------------------------- > PKT-IN - 9.3 Mpps, PKT SIZE - 114 Byte VxLAN Packets (64 Byte payload) --> > PKT-IN - 14 Mpps, PKT SIZE - 64 Byte UDP packets <-- > > PKT-OUT - 3.6 Mpps(Without Optimization) > PKT-OUT - 5.3 Mpps(Using the patch)
Thanks, that is interesting to see, particularly for a gateway-type use case where an appliance is translating between encapsulated and non-encapsulated packets. >> Is this really specific to VXLAN? I'm sure that it could be generalized to >> other >> tunneling protocols (Geneve would be nice given that OVN is using it and I >> know Fortville supports it). But shouldn't it apply to non-tunneled traffic >> as >> well? > Yes, this can be applied for any tunneling protocol provided the NIC > hardware is programmed to handle those packets. > We haven’t tested it for non-tunneled packets. The performance improvement on > non-tunneled packets are subjective due to the fact that there > is a limitation on number of hardware flows(8K on FVL), and software still > has to > spend cycles on matching the flow IDs reported by hardware. This improves the > tunneling performance in all the cases, because it tunnel packets needs two > lookup than one. Looking at the code some more, I think there are basically two sources of optimization here: * Accelerating the EMC by avoiding netdev_flow_key_equal_mf() on the assumption that the rule you've installed points exactly to the correct flow. However, I don't think this is legal because the flows that you are programming the hardware with don't capture the full set of values in an OVS flow. For example, in the case of tunnels, there is no match on DMAC. * Chaining together the multiple lookups used by tunnels on the assumption that the outer VXLAN source port distinguishes the inner flow. This would allow avoiding netdev_flow_key_equal_mf() a second time. This is definitely not legal because the VXLAN source port is only capturing a small subset of the total data that OVS is using. Please correct me if I am wrong. I'm not sure that I really see any advantage in using a Flow Director perfect filter to return a software defined hash value compared to just using the RSS hash directly as we are doing today. I think the main case where it would be useful is if hardware wildcarding was used to skip the EMC altogether and its size constraints. If that was done then I think that this would no longer be specialized to VXLAN at all. >> It looks like this is adding a hardware flow when a new flow is added to the >> datapath. How does this affect flow setup performance? >> > We haven’t performed any stress tests with so many flows to verify the > flow setup performance. What is the expectation here? Currently how many > rules can be > setup per second in OVS ? It's hard to give a concrete number here since flow setup performance depends on the complexity of the flow table and, of course, the machine. In general, the goal is to avoid needing to do flow setups in response to traffic but this depends on the use case. At a minimum, it would be good to understand the difference in performance as a result of this change and try to minimize any impact. Since this is really just a hint and we'll need to deal with mismatch between software and hardware in any case, perhaps it makes sense to program the hardware flows asynchronously. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev