Nikolay Aleksandrov <[email protected]> wrote:
>On Wed, Mar 04, 2026 at 09:27:28AM -0800, Jay Vosburgh wrote:
>> Nikolay Aleksandrov <[email protected]> wrote:
>>
>> >On Wed, Mar 04, 2026 at 03:42:57PM +0800, Jiayuan Chen wrote:
>> >> From: Jiayuan Chen <[email protected]>
>> >>
>> >> bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL
>> >> check. rr_tx_counter is a per-CPU counter only allocated in bond_open()
>> >> when the bond mode is round-robin. If the bond device was never brought
>> >> up, rr_tx_counter remains NULL, causing a null-ptr-deref.
>> >>
>> >> The XDP redirect path can reach this code even when the bond is not up:
>> >> bpf_master_redirect_enabled_key is a global static key, so when any bond
>> >> device has native XDP attached, the XDP_TX -> xdp_master_redirect()
>> >> interception is enabled for all bond slaves system-wide. This allows the
>> >> path xdp_master_redirect() -> bond_xdp_get_xmit_slave() ->
>> >> bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be
>> >> reached on a bond that was never opened.
>> >>
>> >> Fix this by allocating rr_tx_counter unconditionally in bond_init()
>> >> (ndo_init), which is called by register_netdevice() and covers both
>> >> device creation paths (bond_create() and bond_newlink()). This also
>> >> handles the case where bond mode is changed to round-robin after device
>> >> creation. The conditional allocation in bond_open() is removed. Since
>> >> bond_destructor() already unconditionally calls
>> >> free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at
>> >> ndo_init, free at destructor.
>> >>
>> >> Note: rr_tx_counter is only used by round-robin mode, so this
>> >> deliberately allocates a per-cpu u32 that goes unused for other modes.
>> >> Conditional allocation (e.g., in bond_option_mode_set) was considered
>> >> but rejected: the XDP path can race with mode changes on a downed bond,
>> >> and adding memory barriers to the XDP hot path is not justified for
>> >> saving 4 bytes per CPU.
>> >>
>> >> Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave
>> >> device")
>> >> Reported-by: [email protected]
>> >> Closes:
>> >> https://lore.kernel.org/all/[email protected]/T/
>> >> Signed-off-by: Jiayuan Chen <[email protected]>
>> >> ---
>> >> drivers/net/bonding/bond_main.c | 19 +++++++++++++------
>> >> 1 file changed, 13 insertions(+), 6 deletions(-)
>> >>
>> >
>> >IMO it's not worth it to waste memory in all modes, for an unpopular mode.
>> >I think it'd be better to add a null check in bond_rr_gen_slave_id(),
>> >READ/WRITE_ONCE() should be enough since it is allocated only once, and
>> >freed when the xmit code cannot be reachable anymore (otherwise we'd have
>> >more bugs now). The branch will be successfully predicted practically
>> >always,
>> >and you can also mark the ptr being null as unlikely. That way only RR takes
>> >a very minimal hit, if any.
>>
>> Is what you're suggesting different from Jiayuan's proposal[0],
>> in the sense of needing barriers in the XDP hot path to insure ordering?
>>
>> If I understand correctly, your suggestion is something like
>> (totally untested):
>>
>
>Basically yes, that is what I'm proposing + an unlikely() around that
>null check since it is really unlikely and will be always predicted
>correctly, this way it's only for RR mode.
Jiayuan,
Do you agree that the patch below (including Nikolay's
suggestion to add "unlikely") resolves the original issue without memory
waste, and without introducing performance issues (barriers) into the
XDP path?
-J
>> diff --git a/drivers/net/bonding/bond_main.c
>> b/drivers/net/bonding/bond_main.c
>> index eb27cacc26d7..ac2a4fc0aad0 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -4273,13 +4273,17 @@ void bond_work_cancel_all(struct bonding *bond)
>> static int bond_open(struct net_device *bond_dev)
>> {
>> struct bonding *bond = netdev_priv(bond_dev);
>> + u32 __percpu *rr_tx_tmp;
>> struct list_head *iter;
>> struct slave *slave;
>>
>> - if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) {
>> - bond->rr_tx_counter = alloc_percpu(u32);
>> - if (!bond->rr_tx_counter)
>> + if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN &&
>> + !READ_ONCE(bond->rr_tx_counter)) {
>> + rr_tx_tmp = alloc_percpu(u32);
>> + if (!rr_tx_tmp)
>> return -ENOMEM;
>> + WRITE_ONCE(bond->rr_tx_counter, rr_tx_tmp);
>> +
>> }
>>
>> /* reset slave->backup and slave->inactive */
>> @@ -4866,6 +4870,9 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond)
>> struct reciprocal_value reciprocal_packets_per_slave;
>> int packets_per_slave = bond->params.packets_per_slave;
>>
>> + if (!READ_ONCE(bond->rr_tx_counter))
>> + packets_per_slave = 0;
>> +
>> switch (packets_per_slave) {
>> case 0:
>> slave_id = get_random_u32();
>>
>> -J
>>
>>
>> [0]
>> https://lore.kernel.org/netdev/[email protected]/
---
-Jay Vosburgh, [email protected]