On 12/10/2016 23:36, Pravin Shelar wrote:
Sorry for jumping in a bit late. I have couple of high level comments below.

On Thu, Oct 6, 2016 at 10:10 AM, Rony Efraim <ro...@mellanox.com> wrote:
From: Joe Stringer [mailto:j...@ovn.org]  Sent: Thursday, October 06, 2016 5:06 
AM
Subject: Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for
openvswitch

On 27 September 2016 at 21:45, Paul Blakey <pa...@mellanox.com> wrote:
Openvswitch currently configures the kerenel datapath via netlink over an
internal ovs protocol.
This patch series offers a new provider: dpif-netlink-tc that uses the
tc flower protocol to offload ovs rules into HW data-path through netdevices
that e.g represent NIC e-switch ports.
The user can create a bridge with type: datapath_type=dpif-hw-netlink in
order to use this provider.
This provider can be used to pass the tc flower rules to the HW for HW
offloads.
Also introducing in this patch series a policy module in which the
user can program a HW-offload policy. The policy module accept a ovs
flow and returns a policy decision for each flow:NO_OFFLOAD or HW_ONLY --
currently the policy is to HW offload all rules.
If the HW_OFFLOAD rule assignment fails the provider will fallback to the
system datapath.
Flower was chosen b/c its sort of natural to state OVS DP rules for
this classifier. However, the code can be extended to support other
classifiers such as U32, eBPF, etc which have HW offloads as well.

The use-case we are currently addressing is the newly introduced SRIOV
switchdev mode in the Linux kernel which is introduced in version 4.8
[1][2]. This series was tested against SRIOV VFs vports representors of the
Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver.
Paul and Shahar.

[1]
http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5
13334e18a74f70c0be58c2eb73af1715325b870
[2]
http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5
3d94892e27409bb2b48140207c0273b2ba65f61
Thanks for submitting the series. Clearly this is a topic of interest for 
multiple
parties, and it's a good starting point to discuss.

A few of us also discussed this topic today at netdev, so I'll list a few 
points that
we talked about and hopefully others can fill in the bits I miss.
Thanks for summarize our meeting today.
Attached a link to the pdf pic that show the idea (picture <= 1,000 words)
https://drive.google.com/file/d/0B2Yjm5a810FsZEoxOUJHU0l3c01OODUwMzVseXBFOE5MSGxr/view?usp=sharing

Positives
* Hardware offload decision is made in a module in userspace
* Layered dpif approach means that the tc-based hardware offload could sit in
front of kernel or userspace datapaths
* Separate dpif means that if you don't enable it, it doesn't affect you. 
Doesn't
litter another dpif implementation with offload logic.

Because of better modularity and usage of existing kernel interfaces
for flow offload, I like this approach.

Drawbacks
* Additional dpif to maintain. Another implementation to change when
modifying dpif interface. Maybe this doesn't change too often, but there has
been some discussions recently about whether the flow_{put,get,del} should be
converted to use internal flow structures rather than OVS netlink
representation. This is one example of potential impact on development.
[RONY] you are right, but I don't think we can add it outher way. I think that 
the approach of use dpif_netlink will saved us a lot of maintenance.
* Fairly limited support for OVS matches and actions. For instance, it is not 
yet
useful for OVN-style pipeline. But that's not a limitation of the design, just 
the
current implementation.
[RONY] sure, we intend to support OVN and connection tracking, we start with 
the simple case.
Other considerations
* Is tc flower filter setup rate and stats dump fast enough? How does it
compare to existing kernel datapath flow setup rate? Multiple threads inserting
at once? How many filters can be dumped per second?
etc.
[RONY] we will test it, and will try to improve the TC if it will be needed

I think there are two part in flow offloading.
1. Time spent to Add the flow to TC.
2. Time spent on pushing the flow to hardware.

It would be interesting to know which one is dominant in this case.

We achieve about 1K rule insertions per second, we will be looking into the time distribution.

* Currently for a given flow, it will exist in either the offloaded 
implementation
or the kernel datapath. Statistics are only drawn from one location. This is
consistent with how ofproto-dpif-upcall will insert flows - one flow_put
operation and one flow is inserted into the datapath. Correspondingly there is
one udpif_key which reflects the most recently used stats for this datapath
flow. There may be situations where flows need to be in both datapaths, in
which case there either needs to be either one udpif_key per datapath
representation of the flow, or the dpif must hide the second flow and aggregate
stats.
[RONY] as you wrote the dpif responsible to hide it, if the flow is offloaded 
to the HW this traffic won't came to the datapath
We will handle it when we will support this combination.
Extra, not previously discussed
* Testing - we may want a mode where tc flower is used in software mode, to
test the tc netlink interface. It would be good to see extension of kernel 
module
testsuite to at least test some basics of the interface, perhaps also the flower
behaviour (though that may be out of scope of the testsuite in the OVS tree).

I have question about hardware offload capability. How the
capabilities are checked in the dpif module for accelerating
particular flow? or is it try to offload and fallback to software
datapath in case of an error?

For now there is no kernel API to check offload capabilities so our dpif provider will try and offload any flow that the can be entirely translated from OVS attributes to TC/flower terms as the dummy offload policy always returns HW offload. The offload policy could check HW capabilities at a later date if such API is exposed. If a unsupported OVS attribute is masked out (wildcard/zeroed mask) it is ignored, and if it's supported but TC doesn't support masking it, an exact match is taken. If TC/flower/HW fails to offload it, it will be directed to
dpif-netlink for software datapath.

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to