https://bugs.dpdk.org/show_bug.cgi?id=483
Bug ID: 483 Summary: Bond 8023ad lacp handshake sometimes fail Product: DPDK Version: 19.11 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: iobey...@126.com Target Milestone: --- There are two ports in my bond and two hosts are connected by a switch. I open the dpdk debug info with macro RTE_LIBRTE_BOND_DEBUG_8023AD. Port 0 MAC: ac:f9:70:88:f3:26 Port 1 MAC: ac:f9:70:88:f3:27 BOND MAC: ac:f9:70:88:f3:26 When tx_machine send lacp with Port 1 Mac ac:f9:70:88:f3:27, the handshake will fail. when lacp handshake failed, log like this: ---------- 997 [Port 0: rx_machine] -> INITIALIZE 997 [Port 0: periodic_machine] -> NO_PERIODIC ( begind LACP active ) 997 [Port 0: mux_machine] -> DETACHED 997 [Port 0: selection_logic] -> SELECTED: ID= 1 aggregator found aggregator ID= 1 997 [Port 0: mux_machine] DETACHED -> WAITING 1995 [Port 1: tx_machine] Sending LACP frame bond_print_lacp(122) - LACP: { subtype= 01 ver_num=01 actor={ tlv=01, len=14 pri=FFFF, system=AC:F9:70:88:F3:27, key=2100, p_pri=FF00 p_num=0200 state={ ACT AGG DEF EXP } } partner={ tlv=02, len=14 pri=FFFF, system=00:00:00:00:00:00, key=0100, p_pri=FF00 p_num=0000 state={ ACT TIMEOUT AGG } } collector={info=03, length=10, max_delay=0000 , type_term=00, terminator_length = 00 } 1995 [Port 0: tx_machine] Sending LACP frame bond_print_lacp(122) - LACP: { subtype= 01 ver_num=01 actor={ tlv=01, len=14 pri=FFFF, system=AC:F9:70:88:F3:27, key=2100, p_pri=FF00 p_num=0100 state={ ACT AGG DEF EXP } } partner={ tlv=02, len=14 pri=FFFF, system=00:00:00:00:00:00, key=0100, p_pri=FF00 p_num=0000 state={ ACT TIMEOUT AGG } } collector={info=03, length=10, max_delay=0000 , type_term=00, terminator_length = 00 } 2095 [Port 1: mux_machine] ATTACHED Entered 2594 [Port 1: tx_machine] Sending LACP frame ---------- when lacp handshake succeeds, log like this: ---------- 0 [Port 0: rx_machine] -> INITIALIZE 0 [Port 0: periodic_machine] -> NO_PERIODIC ( begind LACP active ) 0 [Port 0: mux_machine] -> DETACHED 99 [Port 0: mux_machine] DETACHED -> WAITING Waiting for slaves to become active... Port 2 MAC: ac:f9:70:88:f3:26 236 [Port 1: rx_machine] -> INITIALIZE 236 [Port 1: periodic_machine] -> NO_PERIODIC ( begind LACP active ) 236 [Port 1: mux_machine] -> DETACHED 236 [Port 1: selection_logic] -> SELECTED: ID= 0 aggregator found aggregator ID= 0 236 [Port 1: mux_machine] DETACHED -> WAITING 1034 [Port 0: tx_machine] Sending LACP frame 1034 [Port 0: tx_machine] Sending LACP frame bond_print_lacp(122) - LACP: { subtype= 01 ver_num=01 actor={ tlv=01, len=14 pri=FFFF, system=AC:F9:70:88:F3:26, key=2100, p_pri=FF00 p_num=0100 state={ ACT AGG DEF EXP } } partner={ tlv=02, len=14 pri=FFFF, system=00:00:00:00:00:00, key=0100, p_pri=FF00 p_num=0000 state={ ACT TIMEOUT AGG } } collector={info=03, length=10, max_delay=0000 , type_term=00, terminator_length = 00 } 1234 [Port 1: tx_machine] Sending LACP frame bond_print_lacp(122) - LACP: { subtype= 01 ver_num=01 actor={ tlv=01, len=14 pri=FFFF, system=AC:F9:70:88:F3:26, key=2100, p_pri=FF00 p_num=0200 state={ ACT AGG DEF EXP } } partner={ tlv=02, len=14 pri=FFFF, system=00:00:00:00:00:00, key=0100, p_pri=FF00 p_num=0000 state={ ACT TIMEOUT AGG } } collector={info=03, length=10, max_delay=0000 , type_term=00, terminator_length = 00 } 2032 [Port 0: tx_machine] Sending LACP frame 2332 [Port 1: rx_machine] LACP -> CURRENT bond_print_lacp(122) - LACP: { subtype= 01 ver_num=01 actor={ tlv=01, len=14 pri=0080, system=F8:98:EF:69:83:91, key=417F, p_pri=0080 p_num=0600 state={ ACT TIMEOUT AGG } } partner={ tlv=02, len=14 pri=FFFF, system=AC:F9:70:88:F3:26, key=2100, p_pri=FF00 p_num=0200 state={ ACT AGG DEF EXP } } collector={info=03, length=10, max_delay=0000 , type_term=00, terminator_length = 00 } 2332 [Port 1: mux_machine] ATTACHED Entered ---------- Through my observation: when log print "SELECTED: ID= 1", it uses the wrong mac address to send lacp. selection_logic function choose wrong aggregator_port_id here. rte_eth_bond_8023ad.c:749 case AGG_STABLE: if (default_slave == slaves_count) new_agg_id = slaves[slave_id]; else new_agg_id = slaves[default_slave]; // sometimes new_agg_id will be 1 why does the lacp handshake succeed sometimes? The "slaves" array is filled with unsure order by function "activate_slave". When port 0 fill the slave[0], It works correctly. -- You are receiving this mail because: You are the assignee for the bug.