On Tue, Mar 13, 2018 at 5:51 PM, Or Gerlitz <gerlitz...@gmail.com> wrote:
Sorry ppl, I added MLNX alias (asap_direct_...@mellanox.com) which is
not open to outer posts,
please remove it from your replies, otherwise it will bump you back.. Or.
> On Wed, Mar 7, 2018 at 12:57 PM, Jiri Pirko <j...@resnulli.us> wrote:
>> Mon, Mar 05, 2018 at 02:28:30PM CET, john.hur...@netronome.com wrote:
>>>Allow drivers to register netdev callbacks for tc offload in linux bonds.
>>>If a netdev has registered and is a slave of a given bond, then any tc
>>>rules offloaded to the bond will be relayed to it if both the bond and the
>>>slave permit hw offload.
>>>Because the bond itself is not offloaded, just the rules, we don't care
>>>about whether the bond ports are on the same device or whether some of
>>>slaves are representor ports and some are not.
> John, I think we must design here for the case where the bond IS offloaded.
> E.g some sort of HW LAG. For example, the mlxsw driver supports
> LAG offload and support tcflower offload, we need to see how these
> two live together, mlx5 supports tcflower offload and we are working on
> bond offload, etc.
>> Please, no "bond" specific calls from drivers. That would be wrong.
>> The idea behing block callbacks was that anyone who is interested could
>> register to receive those. In this case, slave device is interested.
>> So it should register to receive block callbacks in the same way as if
>> the block was directly on top of the slave device. The only thing you
>> need to handle is to propagate block bind/unbind from master down to the
> This sounds nice for the case where one install ingress tc rules on
> the bond (lets
> call them type 1, see next)
> One obstacle pointed by my colleague, Rabie, is that when the upper layer
> issues stat call on the filter, they will get two replies, this can confuse
> and lead to wrong decisions (aging). I wonder if/how we can set a knob
> somewhere that unifies the stats (add packet/bytes, use the latest lastuse).
> Also, lets see what other rules have to be offloaded in that scheme
> (call them type 2/3/4)
> where one bonded two HW ports
> 2. bond being egress port of a rule
> TC rules for overlay networks scheme, e.g in NIC SRIOV
> scheme where one bonds the two uplink representors
> Starting with type 2, in our current NIC HW APIs we have to duplicate
> these rules
> into two rules set to HW:
> 2.1 VF rep --> uplink 0
> 2.2 VF rep --> uplink 1
> and we do that in the driver (add/del two HW rules, combine the stat
> results, etc)
> 3. ingress rule on VF rep port with shared tunnel device being the
> egress (encap)
> and where the routing of the underlay (tunnel) goes through LAG.
> in our case, this is like 2.1/2.2 above, offload two rules, combine stats
> 4. ingress rule shared tunnel device being the ingress and VF rep port
> being the egress (decap)
> this uses the egdev facility to be offloaded into the our driver, and
> then in the driver
> we will treat it like type 1, two rules need to be installed into HW,
> but now, we can't delegate them
> from the vxlan device b/c it has no direct connection with the bond.
> All to all, for the mlx5 use case, seems we have elegant solution only
> for type 1.
> I think we should do the elegant solution for the case where it applicable.
> In parallel if/when newer HW APIs are there such that type 2 and 3 can be set
> using one HW rule whose dest is the bond, we are good. As for type 4,
> need to see
> if/how it can be nicer.