On Thu, Feb 26, 2026 at 08:42:27PM -0800, Jay Vosburgh wrote:
> >>    I'm not sure I'm seeing the problem here, is there an actual
> >> misbehavior being fixed here?  The port is receiving LACPDUs, and from
> >> the receive state machine point of view (Figure 6-18) there's no issue.
> >> The "port_enabled" variable (6.4.7) also informs the state machine
> >> behavior, but that's not the same as what's changed by bonding's
> >> __disable_port function.
> >
> >Yes, the reason I do it here is we select another aggregator and called
> >__disable_port() for the old one. If we don't update sm_rx_state, the port
> >will be keep in collecting/distributing state, and the partner will also
> >keep in the c/d state.
> >
> >Here we entered a logical paradox, on one hand we want to disable the port,
> >on the other hand we keep the port in collecting/distributing state.
> 
>       "disable" the port here really means from bonding's perspective,
> so, generally equivalent to the backup interface of an active-backup
> mode bond.

Oh, got it.

> 
>       Such a backup interface is typically carrier up and able to send
> or receive packets.  The peer generally won't send packets to the backup
> interface, however, as no traffic is sent from the backup, and the MAC
> for the bond uses a different interface, so no forwarding entries will
> direct to the backup interface.
> 
>       There are a couple of special cases, like LLDP, that are handled
> as an exception, but in general, if a peer does send packets to the
> backup interface (due to a switch flood, for example), they're dropped.

OK, this makes sense to me.

> 
> >>    Where I'm going with this is that, when multiple aggregator
> >> support was originally implemented, the theory was to keep aggregators
> >> other than the active agg in a state such that they could be put into
> >> service immediately, without having to do LACPDU exchanges in order to
> >> transition into the appropriate state.  A hot standby, basically,
> >> analogous to an active-backup mode backup interface with link state up.
> >
> >This sounds good. But without LACPDU exchange, the hot standby actor and
                         ^^ I mean with LACPDU exchange..
> >partner should be in collecting/distributing state. What should we do when
> >partner start send packets to us?
> 
>       Did you mean "should not be in c/d state" above?  I.e., without
> LACPDU exchage, ... not in c/d state?
> 
>       Regardless, as above, the situation is generally equivalent to a
> backup interface in active-backup mode: incoming traffic that isn't a
> special case is dropped.  Normal traffic (bearing the bond source MAC)
> isn't sent, as that would update the peer's forwarding table.
> 
>       Nothing in the standard prohibits us from having multiple
> aggregators in c/d state simultaneously.  A configuration with two
> separate bonds, each with interfaces successfully aggregated together
> with their respective peers, wherein those two bonds are placed into a
> third bond in active-backup mode is essentially the same thing as what
> we're discussing.

In theory this looks good. But in fact, when we do failover and set the
previous active port to disabled via
  - __disable_port(port)
    - slave->rx_disabled = 1

This will stop the failover port back to c/d state. For example, in my
testing (see details in patch 03), we have 4 ports, eth0, eth1, eth2, eth3.
eth0 and eth1 are agg1, eth2 and eth3 are agg2. If we do failover on eth1,
when eth1 come up, the final state will be:

3: eth0@if3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue 
master bond0 state UP mode DEFAULT group default qlen 1000
    bond_slave state BACKUP ad_aggregator_id 1 ad_actor_oper_port_state_str 
<active,short_timeout,aggregating,in_sync,collecting,distributing> 
ad_partner_oper_port_state_str 
<active,short_timeout,aggregating,in_sync,collecting,distributing> 
actor_port_prio 10

4: eth1@if4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue 
master bond0 state UP mode DEFAULT group default qlen 1000
    bond_slave state BACKUP ad_aggregator_id 1 ad_actor_oper_port_state_str 
<active,short_timeout,aggregating> ad_partner_oper_port_state_str 
<active,short_timeout,aggregating,in_sync> actor_port_prio 255

5: eth2@if3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue 
master bond0 state UP mode DEFAULT group default qlen 1000
    bond_slave state ACTIVE ad_aggregator_id 2 ad_actor_oper_port_state_str 
<active,short_timeout,aggregating,in_sync,collecting,distributing> 
ad_partner_oper_port_state_str 
<active,short_timeout,aggregating,in_sync,collecting,distributing> 
actor_port_prio 1000

6: eth3@if4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc noqueue 
master bond0 state UP mode DEFAULT group default qlen 1000
    bond_slave state ACTIVE ad_aggregator_id 2 ad_actor_oper_port_state_str 
<active,short_timeout,aggregating,in_sync,collecting,distributing> 
ad_partner_oper_port_state_str 
<active,short_timeout,aggregating,in_sync,collecting,distributing> 
actor_port_prio 255

7: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state 
UP mode DEFAULT group default qlen 1000
    bond mode 802.3ad actor_port_prio ad_aggregator 2

So you can see the eth0 state is c/d, while eth1 state is active, aggregating.
Do you think it's a correct state?

Thanks
Hangbin

Reply via email to