On 27 September 2016 at 21:45, Paul Blakey <pa...@mellanox.com> wrote:
> Openvswitch currently configures the kerenel datapath via netlink over an 
> internal ovs protocol.
>
> This patch series offers a new provider: dpif-netlink-tc that uses the tc 
> flower protocol
> to offload ovs rules into HW data-path through netdevices that e.g represent 
> NIC e-switch ports.
>
> The user can create a bridge with type: datapath_type=dpif-hw-netlink in 
> order to use this provider.
> This provider can be used to pass the tc flower rules to the HW for HW 
> offloads.
>
> Also introducing in this patch series a policy module in which the user can 
> program a HW-offload
> policy. The policy module accept a ovs flow and returns a policy decision for 
> each
> flow:NO_OFFLOAD or HW_ONLY -- currently the policy is to HW offload all rules.
>
> If the HW_OFFLOAD rule assignment fails the provider will fallback to the 
> system datapath.
>
> Flower was chosen b/c its sort of natural to state OVS DP rules for this 
> classifier. However,
> the code can be extended to support other classifiers such as U32, eBPF, etc 
> which have
> HW offloads as well.
>
> The use-case we are currently addressing is the newly introduced SRIOV 
> switchdev mode in the
> Linux kernel which is introduced in version 4.8 [1][2]. This series was 
> tested against SRIOV VFs
> vports representors of the Mellanox 100G ConnectX-4 series exposed by the 
> mlx5 kernel driver.
>
> Paul and Shahar.
>
> [1] 
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=513334e18a74f70c0be58c2eb73af1715325b870
> [2] 
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=53d94892e27409bb2b48140207c0273b2ba65f61

Thanks for submitting the series. Clearly this is a topic of interest
for multiple parties, and it's a good starting point to discuss.

A few of us also discussed this topic today at netdev, so I'll list a
few points that we talked about and hopefully others can fill in the
bits I miss.

Positives
* Hardware offload decision is made in a module in userspace
* Layered dpif approach means that the tc-based hardware offload could
sit in front of kernel or userspace datapaths
* Separate dpif means that if you don't enable it, it doesn't affect
you. Doesn't litter another dpif implementation with offload logic.

Drawbacks
* Additional dpif to maintain. Another implementation to change when
modifying dpif interface. Maybe this doesn't change too often, but
there has been some discussions recently about whether the
flow_{put,get,del} should be converted to use internal flow structures
rather than OVS netlink representation. This is one example of
potential impact on development.
* Fairly limited support for OVS matches and actions. For instance, it
is not yet useful for OVN-style pipeline. But that's not a limitation
of the design, just the current implementation.

Other considerations
* Is tc flower filter setup rate and stats dump fast enough? How does
it compare to existing kernel datapath flow setup rate? Multiple
threads inserting at once? How many filters can be dumped per second?
etc.
* Currently for a given flow, it will exist in either the offloaded
implementation or the kernel datapath. Statistics are only drawn from
one location. This is consistent with how ofproto-dpif-upcall will
insert flows - one flow_put operation and one flow is inserted into
the datapath. Correspondingly there is one udpif_key which reflects
the most recently used stats for this datapath flow. There may be
situations where flows need to be in both datapaths, in which case
there either needs to be either one udpif_key per datapath
representation of the flow, or the dpif must hide the second flow and
aggregate stats.

Extra, not previously discussed
* Testing - we may want a mode where tc flower is used in software
mode, to test the tc netlink interface. It would be good to see
extension of kernel module testsuite to at least test some basics of
the interface, perhaps also the flower behaviour (though that may be
out of scope of the testsuite in the OVS tree).

Thanks,
Joe
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to