On Thu, Feb 26, 2026 at 05:16:55PM -0800, Jay Vosburgh wrote: > Hangbin Liu <[email protected]> wrote: > > >When disabling a port’s collecting and distributing states, updating only > >rx_disabled is not sufficient. We also need to set AD_RX_PORT_DISABLED > >so that the rx_machine transitions into the AD_RX_EXPIRED state. > > > >One example is in ad_agg_selection_logic(): when a new aggregator is > >selected and old active aggregator is disabled, if AD_RX_PORT_DISABLED is > >not set, the disabled port may remain stuck in AD_RX_CURRENT due to > >continuing to receive partner LACP messages. > > I'm not sure I'm seeing the problem here, is there an actual > misbehavior being fixed here? The port is receiving LACPDUs, and from > the receive state machine point of view (Figure 6-18) there's no issue. > The "port_enabled" variable (6.4.7) also informs the state machine > behavior, but that's not the same as what's changed by bonding's > __disable_port function.
Yes, the reason I do it here is we select another aggregator and called __disable_port() for the old one. If we don't update sm_rx_state, the port will be keep in collecting/distributing state, and the partner will also keep in the c/d state. Here we entered a logical paradox, on one hand we want to disable the port, on the other hand we keep the port in collecting/distributing state. > > Where I'm going with this is that, when multiple aggregator > support was originally implemented, the theory was to keep aggregators > other than the active agg in a state such that they could be put into > service immediately, without having to do LACPDU exchanges in order to > transition into the appropriate state. A hot standby, basically, > analogous to an active-backup mode backup interface with link state up. This sounds good. But without LACPDU exchange, the hot standby actor and partner should be in collecting/distributing state. What should we do when partner start send packets to us? > > I haven't tested this in some time, though, so my question is > whether this change affects the failover time when an active aggregator > is de-selected in favor of another aggregator. By "failover time," I > mean how long transmission and/or reception are interrupted when > changing from one aggregator to another. I presume that if aggregator > failover ater this change requires LACPDU exchanges, etc, it will take > longer to fail over. I haven't tested it yet. I think the failover time should be in 1 second. Let me do some testing today. Thanks Hangbin

