On Fri, Mar 18, 2016 at 8:50 AM, Chandran, Sugesh
<sugesh.chand...@intel.com> wrote:
> Hi Jesse,
> Please find my answers inline.
>
> Regards
> _Sugesh
>
>
>> -----Original Message-----
>> From: Jesse Gross [mailto:je...@kernel.org]
>> Sent: Thursday, March 17, 2016 11:50 PM
>> To: Chandran, Sugesh <sugesh.chand...@intel.com>
>> Cc: dev@openvswitch.org
>> Subject: Re: [ovs-dev] [RFC PATCH] tunneling: Improving vxlan performance
>> using DPDK flow director feature.
>>
>> On Thu, Mar 17, 2016 at 3:43 PM, Chandran, Sugesh
>> <sugesh.chand...@intel.com> wrote:
>> > Hi,
>> >
>> > This patch proposes an approach that uses Flow director feature on the
>> Intel Fortville NICs to boost the VxLAN tunneling performance. In our testing
>> we verified that the VxLAN performance is almost doubled with this patch.
>> > The solution programs the NIC to report the flow ID along with the VxLAN
>> packets, and it is matched by OVS in software. There may be corner cases
>> that needs to addressed in the approach, For eg:  There is a possibility of 
>> race
>> condition where NIC reports flow ID that may match on different flow in
>> OVS. This happen when a rule is evicted by a new rule with same flowID+
>> hash in the OVS software. The packets may hit on wrong new rule in OVS
>> until the flow get deleted in the hardware too.
>> >
>> > It is a hardware specific implementation (Only work with Intel Fortville
>> NICs) for now, however the proposal works with any programmable
>> NICs.This RFC proves that the OVS can offer very high speed tunneling
>> performance using flow programmability in NICs. I am looking for
>> comments/suggestions on adding this support(such as configuring, enable it
>> for all the programmable NICs and etc) in OVS userspace datapath for
>> improving the performance.
>>
>> This is definitely very interesting to see. Can you post some more specific
>> performance numbers?
> [Sugesh]
> VxLAN DECAP performance(Unidirectional, Single flow, Single CPU Core)
> -------------------------------------------------------------------
> PKT-IN - 9.3 Mpps
> Pkt size - 114 byte VxLAN Packets(64 byte payload)
> PKT-OUT - 5.6 Mpps( Without Optimization)
> PKT-OUT - 9.3 Mpps(After the optimization, It hits the Input Line rate)
>
> VxLAN ENCAP-DECAP performance (Bidirectional, Single CPU Core)
> ---------------------------------------------------------------------------------
> PKT-IN - 9.3 Mpps, PKT SIZE - 114 Byte VxLAN Packets (64 Byte payload) -->
> PKT-IN - 14 Mpps, PKT SIZE - 64 Byte UDP packets <--
>
> PKT-OUT - 3.6 Mpps(Without Optimization)
> PKT-OUT - 5.3 Mpps(Using the patch)

Thanks, that is interesting to see, particularly for a gateway-type
use case where an appliance is translating between encapsulated and
non-encapsulated packets.

>> Is this really specific to VXLAN? I'm sure that it could be generalized to 
>> other
>> tunneling protocols (Geneve would be nice given that OVN is using it and I
>> know Fortville supports it). But shouldn't it apply to non-tunneled traffic 
>> as
>> well?
> Yes, this can be applied for any tunneling protocol provided the NIC
> hardware is programmed to handle those packets.
> We haven’t tested it for non-tunneled packets. The performance improvement on
> non-tunneled packets are subjective due to the fact that there
> is a limitation on number of hardware flows(8K on FVL), and software still 
> has to
> spend cycles on matching the flow IDs reported by hardware.  This improves the
> tunneling performance in all the cases, because it tunnel packets needs two 
> lookup than one.

Looking at the code some more, I think there are basically two sources
of optimization here:
 * Accelerating the EMC by avoiding netdev_flow_key_equal_mf() on the
assumption that the rule you've installed points exactly to the
correct flow. However, I don't think this is legal because the flows
that you are programming the hardware with don't capture the full set
of values in an OVS flow. For example, in the case of tunnels, there
is no match on DMAC.
 * Chaining together the multiple lookups used by tunnels on the
assumption that the outer VXLAN source port distinguishes the inner
flow. This would allow avoiding netdev_flow_key_equal_mf() a second
time. This is definitely not legal because the VXLAN source port is
only capturing a small subset of the total data that OVS is using.

Please correct me if I am wrong.

I'm not sure that I really see any advantage in using a Flow Director
perfect filter to return a software defined hash value compared to
just using the RSS hash directly as we are doing today. I think the
main case where it would be useful is if hardware wildcarding was used
to skip the EMC altogether and its size constraints. If that was done
then I think that this would no longer be specialized to VXLAN at all.

>> It looks like this is adding a hardware flow when a new flow is added to the
>> datapath. How does this affect flow setup performance?
>>
> We haven’t performed any stress tests with so many flows to verify the
> flow setup performance. What is the expectation here? Currently how many 
> rules can be
> setup per second in OVS ?

It's hard to give a concrete number here since flow setup performance
depends on the complexity of the flow table and, of course, the
machine. In general, the goal is to avoid needing to do flow setups in
response to traffic but this depends on the use case. At a minimum, it
would be good to understand the difference in performance as a result
of this change and try to minimize any impact. Since this is really
just a hint and we'll need to deal with mismatch between software and
hardware in any case, perhaps it makes sense to program the hardware
flows asynchronously.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to