On Thu, May 21, 2026 at 11:30:04AM -0300, Jason Gunthorpe wrote:
> 1) Put "iommufd: Fix data_len byte-count vs element-count mismatch"
>    first

OK.

> 2) This "Returning -ENOMEM for allocation failures but 0 for queue overflows 
> treats
>    the conditions differently, which seems to contradict the stated
>    intent." Seems bogus, I think adjust the commit message. We do want
>    0 for queue full conditions.

Ack.

> 3) Let's fix the "Will this lockless read concurrent with a plain write cause 
> a
>    data race?" by removing the optimization, just pre-allocate and
>    fail. We don't expect this to be a normal condition worth
>    optimizing

I can drop it.

FWIW, it was added to address a Sashiko review also:

  By moving the allocation outside the spinlock, the precondition check that
  skipped the allocation when the queue was full is bypassed.

  When the queue is full, which can be common during a hardware fault storm
  if userspace cannot keep up, the code now unconditionally allocates memory,
  copies data, acquires the lock, and then immediately frees the memory and
  drops the event.

  Can this tight loop of wasteful slab allocations, memory copies, and
  deallocations exacerbate IOMMU fault storms by adding unnecessary CPU
  overhead?

  Would it be possible to add an optimistic lockless check, such as
  READ_ONCE(veventq->num_events) < veventq->depth, to bypass the allocation
  when the queue appears full?

> 4) I'm OK with ENOMEM here, leave it, EAGAIN should mean it is
>    pollable and it won't become pollable..

Yea. Sashiko would complain about an EAGAIN as well :-)

> 5) The sizeof(hdr) has been fixed in my rc branch. You can rebase on
>    top of that and also ensure to send a base-commit trailer to help
>    Sashiko apply the patches properly

Oh, I forgot to add base commit ID. Will use your for-rc branch.

> 6) What do you think about the "but done has
>    already been incremented by sizeof(*hdr)" ? unrelated issue? If it
>    is simple lets add a patch here to fix it

I added a patch but didn't include in the series -- Sashiko would
raise more questions against that patch...

I think it's a separate bug; Sashiko pointed out another in fault
queue as well. Both bugs are at failure (corner cases?) path.

I'd like to address them separately.

Thanks
Nicolin

Reply via email to