A patch was sent to "fix" the wait_index variable that is used to help with
waking of waiters on the ring buffer. The patch was rejected, but I started
looking at associated code. Discussing it on IRC with Mathieu Desnoyers
we discovered a design flaw.

The waiter reads "wait_index" then enters a "wait loop". After adding
itself to the wait queue, if the buffer is filled or a signal is pending
it will exit out. If wait_index is different, it will also exit out.
(Note, I noticed that the check for wait_index was after the schedule()
call and should have been before it, but that's besides the point).

The race is what happens if the waker updates the wait_index,
a new waiter comes in and reads the updated wait_index before it adds
itself to the queue, it will miss the update!

This is a series of changes to fix the design, and other bugs found
along the way.

1) Remove the wait loop in the ring_buffer_wait() code. It doesn't
   have enough context to know if it should continue to wait.

2) Fix "shortest_full" accounting. When zero, it gets set to the
   smallest percentage requested of the buffer before the writers need to
   start waking up waiters. But after waiters are all woken up,
   it never gets reset, so the smallest value will always be the
   smallest value until the buffer itself is reset.

3) The wake up of readers on close was incorrectly added to the
   .release() callback and not the .flush(). That is, waiters
   in the .read() call, will never be woken up because .release()
   isn't called until all .read()s have finished. Move the wakeup
   to .flush() which is called immediately by close().

4) Add a "waking" field to the trace iterator that gets set when a
   waker wants to wake up the readers. For the .flush() callback,
   it is set and never cleared to make sure new readers do not block.

5/6) Break up ring_buffer_wait() into three functions:

  ring_buffer_prepare_to_wait()
  ring_buffer_wait()
  ring_buffer_finish_wait()

    This allows the caller to add itself to the wait queue, check
    if its own condition has been set (in this case: iter->waking)
    and then sleep. Follows the same semantics as any other wait logic.

Steven Rostedt (Google) (6):
      ring-buffer: Fix waking up ring buffer readers
      ring-buffer: Fix resetting of shortest_full
      tracing: Use .flush() call to wake up readers
      tracing: Fix waking up tracing readers
      ring-buffer: Restructure ring_buffer_wait() to prepare for updates
      tracing/ring-buffer: Fix wait_on_pipe() race

----
 include/linux/ring_buffer.h  |   4 +
 include/linux/trace_events.h |   3 +-
 kernel/trace/ring_buffer.c   | 251 +++++++++++++++++++++++++++----------------
 kernel/trace/trace.c         | 128 ++++++++++++++++++----
 4 files changed, 269 insertions(+), 117 deletions(-)

Reply via email to