Hangbin Liu <[email protected]> wrote:
>On Thu, Feb 26, 2026 at 05:16:55PM -0800, Jay Vosburgh wrote:
>> Hangbin Liu <[email protected]> wrote:
>>
>> >When disabling a port’s collecting and distributing states, updating only
>> >rx_disabled is not sufficient. We also need to set AD_RX_PORT_DISABLED
>> >so that the rx_machine transitions into the AD_RX_EXPIRED state.
>> >
>> >One example is in ad_agg_selection_logic(): when a new aggregator is
>> >selected and old active aggregator is disabled, if AD_RX_PORT_DISABLED is
>> >not set, the disabled port may remain stuck in AD_RX_CURRENT due to
>> >continuing to receive partner LACP messages.
>>
>> I'm not sure I'm seeing the problem here, is there an actual
>> misbehavior being fixed here? The port is receiving LACPDUs, and from
>> the receive state machine point of view (Figure 6-18) there's no issue.
>> The "port_enabled" variable (6.4.7) also informs the state machine
>> behavior, but that's not the same as what's changed by bonding's
>> __disable_port function.
>
>Yes, the reason I do it here is we select another aggregator and called
>__disable_port() for the old one. If we don't update sm_rx_state, the port
>will be keep in collecting/distributing state, and the partner will also
>keep in the c/d state.
>
>Here we entered a logical paradox, on one hand we want to disable the port,
>on the other hand we keep the port in collecting/distributing state.
"disable" the port here really means from bonding's perspective,
so, generally equivalent to the backup interface of an active-backup
mode bond.
Such a backup interface is typically carrier up and able to send
or receive packets. The peer generally won't send packets to the backup
interface, however, as no traffic is sent from the backup, and the MAC
for the bond uses a different interface, so no forwarding entries will
direct to the backup interface.
There are a couple of special cases, like LLDP, that are handled
as an exception, but in general, if a peer does send packets to the
backup interface (due to a switch flood, for example), they're dropped.
>> Where I'm going with this is that, when multiple aggregator
>> support was originally implemented, the theory was to keep aggregators
>> other than the active agg in a state such that they could be put into
>> service immediately, without having to do LACPDU exchanges in order to
>> transition into the appropriate state. A hot standby, basically,
>> analogous to an active-backup mode backup interface with link state up.
>
>This sounds good. But without LACPDU exchange, the hot standby actor and
>partner should be in collecting/distributing state. What should we do when
>partner start send packets to us?
Did you mean "should not be in c/d state" above? I.e., without
LACPDU exchage, ... not in c/d state?
Regardless, as above, the situation is generally equivalent to a
backup interface in active-backup mode: incoming traffic that isn't a
special case is dropped. Normal traffic (bearing the bond source MAC)
isn't sent, as that would update the peer's forwarding table.
Nothing in the standard prohibits us from having multiple
aggregators in c/d state simultaneously. A configuration with two
separate bonds, each with interfaces successfully aggregated together
with their respective peers, wherein those two bonds are placed into a
third bond in active-backup mode is essentially the same thing as what
we're discussing.
-J
>> I haven't tested this in some time, though, so my question is
>> whether this change affects the failover time when an active aggregator
>> is de-selected in favor of another aggregator. By "failover time," I
>> mean how long transmission and/or reception are interrupted when
>> changing from one aggregator to another. I presume that if aggregator
>> failover ater this change requires LACPDU exchanges, etc, it will take
>> longer to fail over.
>
>I haven't tested it yet. I think the failover time should be in 1 second.
>Let me do some testing today.
>
>Thanks
>Hangbin
---
-Jay Vosburgh, [email protected]