On Thu, 2026-06-18 at 20:26 -0400, Tamir Duberstein wrote:
> After consuming the last visible record, ringbuf_process_ring()
> publishes the consumer position and checks the producer position. These
> operations lack a full StoreLoad barrier. A producer can therefore
> commit a new record but read the old consumer position while the
> consumer reads the old producer position. The producer sends no
> notification and the consumer waits despite a queued record.
> 
> Insert a full barrier between publishing a consumer position and the
> next producer position load. When a record bound or callback ends the
> current invocation first, execute the barrier before returning so the
> load in a later invocation completes the same handshake.
> 
> Add an edge-triggered epoll test that drains one record per call while a
> concurrent producer fills the ring. Without the barrier, a missed
> notification leaves the producer dropping records from a full ring while
> the consumer times out. Document that bounded consumers and callbacks
> that terminate consumption must drain before waiting again.
> 
> Fixes: bf99c936f947 ("libbpf: Add BPF ring buffer support")
> Reported-by: Andrew Werner <[email protected]>
> Reported-by: Sashiko <[email protected]>
> Closes: 
> https://lore.kernel.org/bpf/[email protected]/
> Assisted-by: Codex:gpt-5.5
> Signed-off-by: Tamir Duberstein <[email protected]>
> ---

Took me a while, but I agree there is a race here.
FWIW, here is the description of the race as far as I understand it.

Simplified pseudo-code for the consumer (ringbuf_process_ring)

    cons_pos = load_acquire(consumer_pos);      // op#1 [orders before 2,3,4,5]
    do {
      got_new_data = false;
      prod_pos = load_acquire(producer_pos);    // op#2 [orders before 3,4,5]
      while (cons_pos < prod_pos) {
        len_ptr = ... cons_pos ...;
        len = load_acquire(len_ptr);            // op#3 [orders before 4,5]
        got_new_data = true;
        callback(... cons_pos ...);             // op#4
        cons_pos += len;
        store_release(consumer_pos, cons_pos);  // op#5 [orders after 2,3,4; 
ordering relative to 2' not defined]
      }
    } while (got_new_data);

  The patch fixes the issue with op#5, store_release operation is
  ordered after operations 2,3,4, but it's ordering relative to op#2
  from the next outer loop iteration (denoted as 2') is not defined.

Simplified pseudo-code for the producer (bpf_ringbuf_{reserve,commit})

    // reserve
    store_release(producer_pos, ...);

    // commit
    xchg(... clear BUSY_BIT ...);
    cons_pos = load_acquire(consumer_pos);
    if (cons_pos == rec_pos)
        schedule_wakeup();

A possible sequence of events

  Producer:

    // submits a record such that:
    consumer_pos = 0   (initial state)
    producer_pos = 10

  Consumer:

    cons_pos = load_acquire(consumer_pos)     -> cons_pos = 0
    prod_pos = load_acquire(producer_pos)     -> prod_pos = 10
      len = load_acquire(len_ptr)             -> len = 10
      callback(...)                           -> (record at pos 0 consumed)
      cons_pos += len                         -> cons_pos = 10
      store_release(consumer_pos, cons_pos)   -> consumer_pos = 10   (issued; 
global visibility deferred)
      (cons_pos < prod_pos) 10 < 10           -> false, exit inner loop
    while (got_new_data)                      -> true, re-enter outer loop
    prod_pos = load_acquire(producer_pos)     -> prod_pos = 10       (reads 
current value; record at pos 10 not published yet)
      (cons_pos < prod_pos) 10 < 10           -> false, skip inner loop
    while (got_new_data)                      -> false; consumer returns 0 and 
enters an epoll wait

  Producer (reserves and submits a new record):

    store_release(producer_pos, 20)           -> producer_pos = 20   (record at 
pos 10 reserved)
    xchg(... clear BUSY_BIT ...)              -> (record at pos 10 committed)
    cons_pos = load_acquire(consumer_pos)     -> cons_pos = 0        
(consumer's store of 10 not visible yet)
      rec_pos = 10; cons_pos == rec_pos       -> 0 == 10 false -> NO WAKEUP

  Consumer:

    (deferred store becomes visible)          -> consumer_pos = 10   (too late)

---

Consumer exits while a new record is available in the buffer,
and the producer fails to notify the consumer via epoll.
The added barrier prevents such sequence of events.

For the fix:

Reviewed-by: Eduard Zingerman <[email protected]>

Please move the test to a separate commit, I'll review it a bit later.

[...]

Reply via email to