February 11, 2026 at 19:44, "Sebastian Andrzej Siewior" <[email protected] mailto:[email protected]?to=%22Sebastian%20Andrzej%20Siewior%22%20%3Cbigeasy%40linutronix.de%3E > wrote:
> > On 2026-02-11 14:44:16 [+0800], Jiayuan Chen wrote: > > > > > From: Jiayuan Chen <[email protected]> > > > > On PREEMPT_RT kernels, the per-CPU xdp_bulk_queue (bq) can be accessed > > concurrently by multiple preemptible tasks on the same CPU. > > > > The original code assumes bq_enqueue() and __cpu_map_flush() run > > atomically with respect to each other on the same CPU, relying on > > local_bh_disable() to prevent preemption. However, on PREEMPT_RT, > > local_bh_disable() only calls migrate_disable() and does not disable > > preemption. spin_lock() also becomes a sleeping rt_mutex. Together, > > this allows CFS scheduling to preempt a task during bq_flush_to_queue(), > > enabling another task on the same CPU to enter bq_enqueue() and operate > > on the same per-CPU bq concurrently. > > > … > > > > > Fixes: d2d6422f8bd1 ("x86: Allow to enable PREEMPT_RT.") > > > Can you reproduce this? It should not trigger with the commit above. > It should trigger starting with > 3253cb49cbad4 ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT") Thanks for the review, Sebastian. You are right. The race only becomes possible after the softirq BKL is dropped. > > > > Reported-by: [email protected] > > Closes: > > https://lore.kernel.org/all/[email protected]/T/ > > Signed-off-by: Jiayuan Chen <[email protected]> > > Signed-off-by: Jiayuan Chen <[email protected]> > > --- > > kernel/bpf/cpumap.c | 16 +++++++++++++++- > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c > > index 04171fbc39cb..7fda8421ec40 100644 > > --- a/kernel/bpf/cpumap.c > > +++ b/kernel/bpf/cpumap.c > > @@ -714,6 +717,7 @@ const struct bpf_map_ops cpu_map_ops = { > > .map_redirect = cpu_map_redirect, > > }; > > > > +/* Caller must hold bq->bq_lock */ > > > If this information is important please use lockdep_assert_held() in the > function below. This can be used by lockdep and is understood by humans > while the comment is only visible to humans. Will add lockdep_assert_held() in bq_flush_to_queue() and drop the comment. > > > > static void bq_flush_to_queue(struct xdp_bulk_queue *bq) > > { > > struct bpf_cpu_map_entry *rcpu = bq->obj; > > @@ -750,10 +754,16 @@ static void bq_flush_to_queue(struct xdp_bulk_queue > > *bq) > > > > /* Runs under RCU-read-side, plus in softirq under NAPI protection. > > * Thus, safe percpu variable access. > > > + PREEMPT_RT relies on local_lock_nested_bh(). > > > > > + * > > + * On PREEMPT_RT, local_bh_disable() does not disable preemption, > > + * so we use local_lock to serialize access to the per-CPU bq. > > */ > > static void bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame > > *xdpf) > > { > > - struct xdp_bulk_queue *bq = this_cpu_ptr(rcpu->bulkq); > > + struct xdp_bulk_queue *bq; > > + > > + local_lock(&rcpu->bulkq->bq_lock); > > > local_lock_nested_bh() & the matching unlock here and in the other > places, please. Makes sense. Since these paths already run under local_bh_disable(), local_lock_nested_bh() is the correct primitive. > Sebastian >
