On Tue, Mar 13, 2018 at 5:51 PM, Or Gerlitz <[email protected]> wrote:
Sorry ppl, I added MLNX alias ([email protected]) which is not open to outer posts, please remove it from your replies, otherwise it will bump you back.. Or. > On Wed, Mar 7, 2018 at 12:57 PM, Jiri Pirko <[email protected]> wrote: >> Mon, Mar 05, 2018 at 02:28:30PM CET, [email protected] wrote: >>>Allow drivers to register netdev callbacks for tc offload in linux bonds. >>>If a netdev has registered and is a slave of a given bond, then any tc >>>rules offloaded to the bond will be relayed to it if both the bond and the >>>slave permit hw offload. > >>>Because the bond itself is not offloaded, just the rules, we don't care >>>about whether the bond ports are on the same device or whether some of >>>slaves are representor ports and some are not. > > John, I think we must design here for the case where the bond IS offloaded. > E.g some sort of HW LAG. For example, the mlxsw driver supports > LAG offload and support tcflower offload, we need to see how these > two live together, mlx5 supports tcflower offload and we are working on > bond offload, etc. > >>>+EXPORT_SYMBOL_GPL(tc_setup_cb_bond_register); >> >> Please, no "bond" specific calls from drivers. That would be wrong. >> The idea behing block callbacks was that anyone who is interested could >> register to receive those. In this case, slave device is interested. >> So it should register to receive block callbacks in the same way as if >> the block was directly on top of the slave device. The only thing you >> need to handle is to propagate block bind/unbind from master down to the >> slaves. > > Jiri, > > This sounds nice for the case where one install ingress tc rules on > the bond (lets > call them type 1, see next) > > One obstacle pointed by my colleague, Rabie, is that when the upper layer > issues stat call on the filter, they will get two replies, this can confuse > them > and lead to wrong decisions (aging). I wonder if/how we can set a knob > somewhere that unifies the stats (add packet/bytes, use the latest lastuse). > > Also, lets see what other rules have to be offloaded in that scheme > (call them type 2/3/4) > where one bonded two HW ports > > 2. bond being egress port of a rule > > TC rules for overlay networks scheme, e.g in NIC SRIOV > scheme where one bonds the two uplink representors > > Starting with type 2, in our current NIC HW APIs we have to duplicate > these rules > into two rules set to HW: > > 2.1 VF rep --> uplink 0 > 2.2 VF rep --> uplink 1 > > and we do that in the driver (add/del two HW rules, combine the stat > results, etc) > > 3. ingress rule on VF rep port with shared tunnel device being the > egress (encap) > and where the routing of the underlay (tunnel) goes through LAG. > > in our case, this is like 2.1/2.2 above, offload two rules, combine stats > > 4. ingress rule shared tunnel device being the ingress and VF rep port > being the egress (decap) > > this uses the egdev facility to be offloaded into the our driver, and > then in the driver > we will treat it like type 1, two rules need to be installed into HW, > but now, we can't delegate them > from the vxlan device b/c it has no direct connection with the bond. > > All to all, for the mlx5 use case, seems we have elegant solution only > for type 1. > > I think we should do the elegant solution for the case where it applicable. > > In parallel if/when newer HW APIs are there such that type 2 and 3 can be set > using one HW rule whose dest is the bond, we are good. As for type 4, > need to see > if/how it can be nicer. > > Or.
