On Wed, Mar 7, 2018 at 12:57 PM, Jiri Pirko <j...@resnulli.us> wrote:
> Mon, Mar 05, 2018 at 02:28:30PM CET, john.hur...@netronome.com wrote:
>>Allow drivers to register netdev callbacks for tc offload in linux bonds.
>>If a netdev has registered and is a slave of a given bond, then any tc
>>rules offloaded to the bond will be relayed to it if both the bond and the
>>slave permit hw offload.
>>Because the bond itself is not offloaded, just the rules, we don't care
>>about whether the bond ports are on the same device or whether some of
>>slaves are representor ports and some are not.
John, I think we must design here for the case where the bond IS offloaded.
E.g some sort of HW LAG. For example, the mlxsw driver supports
LAG offload and support tcflower offload, we need to see how these
two live together, mlx5 supports tcflower offload and we are working on
bond offload, etc.
> Please, no "bond" specific calls from drivers. That would be wrong.
> The idea behing block callbacks was that anyone who is interested could
> register to receive those. In this case, slave device is interested.
> So it should register to receive block callbacks in the same way as if
> the block was directly on top of the slave device. The only thing you
> need to handle is to propagate block bind/unbind from master down to the
This sounds nice for the case where one install ingress tc rules on
the bond (lets
call them type 1, see next)
One obstacle pointed by my colleague, Rabie, is that when the upper layer
issues stat call on the filter, they will get two replies, this can confuse them
and lead to wrong decisions (aging). I wonder if/how we can set a knob
somewhere that unifies the stats (add packet/bytes, use the latest lastuse).
Also, lets see what other rules have to be offloaded in that scheme
(call them type 2/3/4)
where one bonded two HW ports
2. bond being egress port of a rule
TC rules for overlay networks scheme, e.g in NIC SRIOV
scheme where one bonds the two uplink representors
Starting with type 2, in our current NIC HW APIs we have to duplicate
into two rules set to HW:
2.1 VF rep --> uplink 0
2.2 VF rep --> uplink 1
and we do that in the driver (add/del two HW rules, combine the stat
3. ingress rule on VF rep port with shared tunnel device being the
and where the routing of the underlay (tunnel) goes through LAG.
in our case, this is like 2.1/2.2 above, offload two rules, combine stats
4. ingress rule shared tunnel device being the ingress and VF rep port
being the egress (decap)
this uses the egdev facility to be offloaded into the our driver, and
then in the driver
we will treat it like type 1, two rules need to be installed into HW,
but now, we can't delegate them
from the vxlan device b/c it has no direct connection with the bond.
All to all, for the mlx5 use case, seems we have elegant solution only
for type 1.
I think we should do the elegant solution for the case where it applicable.
In parallel if/when newer HW APIs are there such that type 2 and 3 can be set
using one HW rule whose dest is the bond, we are good. As for type 4,
need to see
if/how it can be nicer.