On 2019-07-09, Petr Mladek <[email protected]> wrote:
>>>> 1. The code claims that the cmpxchg(seq_newest) in
>>>> prb_reserve_desc() guarantees that "The descriptor is ours until
>>>> the COMMITTED bit is set."  This is not true if in that wind
>>>> seq_newest wraps, allowing another writer to gain ownership of the
>>>> same descriptor. For small descriptor arrays (such as in my test
>>>> module), this situation is quite easy to reproduce.
>>>
>> Let me inline the function are talking about and add commentary to
>> illustrate what I am saying:
>> 
>> static int prb_reserve_desc(struct prb_reserved_entry *entry)
>> {
>>      unsigned long seq, seq_newest, seq_prev_wrap;
>>      struct printk_ringbuffer *rb = entry->rb;
>>      struct prb_desc *desc;
>>      int err;
>> 
>>      /* Get descriptor for the next sequence number. */
>>      do {
>>              seq_newest = READ_ONCE(rb->seq_newest);
>>              seq = (seq_newest + 1) & PRB_SEQ_MASK;
>>              seq_prev_wrap = (seq - PRB_DESC_SIZE(rb)) & PRB_SEQ_MASK;
>> 
>>              /*
>>               * Remove conflicting descriptor from the previous wrap
>>               * if ever used. It might fail when the related data
>>               * have not been committed yet.
>>               */
>>              if (seq_prev_wrap == READ_ONCE(rb->seq_oldest)) {
>>                      err = prb_remove_desc_oldest(rb, seq_prev_wrap);
>>                      if (err)
>>                              return err;
>>              }
>>      } while (cmpxchg(&rb->seq_newest, seq_newest, seq) != seq_newest);
>> 
>> I am referring to this point in the code, after the
>> cmpxchg(). seq_newest has been incremented but the descriptor is
>> still in the unused state and seq is still 1 wrap behind. If an NMI
>> occurs here and the NMI (or some other CPU) inserts enough entries to
>> wrap the descriptor array, this descriptor will be reserved again,
>> even though it has already been reserved.
>
> Not really, the NMI will not reach the cmpxchg() in this case.
> prb_remove_desc_oldest() will return error.

Why will prb_remove_desc_oldest() fail? IIUC, it will return success
because the descriptor is in the desc_miss state.

> It will not be able to remove the conflicting descriptor because
> it will still be occupied by a two-wraps-old descriptor.

Please explain why with more details. Perhaps providing a function call
chain?  Sorry if I'm missing the obvious here.

This is really the critical point that drove me to use lists: multiple
writers expiring and reserving the same descriptors.

John Ogness

Reply via email to