Hi,

I'm sorry, I have some concern about the patch.

How it works, as far as I understand:

- DPDK simulates interrupts in user mode with epoll_wait()
- mlx5 PMD emits the async counter query command to the NIC periodically 
- there might be multiple async query commands in the flight
- kernel drivers handles the async query completion interrupts, pushes the 
token to the internal completion queue and unblocks associated fd
- epoll_wait() sees this unblocked fd and notifies mlx5 PMD about
- mlx5 PMD reads the completion token from the kernel queue with 
devx_get_async_cmd_comp()

The concern scenario, let's assume:

- we have 2 async query commands in the flight
- the first async query completes, fd is unblocked, PMD is inviked, the 
completion is read by PMD and is being handled
- the second async query completes, fd gets unblocked, the second token is 
written to the queue
- the PMD completes the handling of the first completion and reads the queue 
again (with devx_get_async_cmd_comp() call in the loop)
- it reads the second token successfully and handles
- then, on the third call, devx_get_async_cmd_comp() returns EAGAIN, it means 
queue is empty
- DPDK calls epoll_wait() again and sees unblocked fd
- it call mlx5 PMD, and it calls devx_get_async_cmd_comp(), but queue is empty 
(handled in previous interrupt handling)
- with the patch we wrongly remove the handler 

In my opinion, we should handle flags EPOLLERR | EPOLLHUP | EPOLLRDHU from the 
epoll_wait()_return also for
RTE_INTR_HANDLE_EXT and RTE_INTR_HANDLE_DEV_EVENT interrupt types.

With best regards,
Slava
 



> -----Original Message-----
> From: Kevin Traynor <[email protected]>
> Sent: Tuesday, February 10, 2026 5:06 PM
> To: Stephen Hemminger <[email protected]>
> Cc: [email protected]; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <[email protected]>; [email protected]; Dariusz Sosnowski
> <[email protected]>; Slava Ovsiienko <[email protected]>;
> [email protected]
> Subject: Re: [PATCH v2 1/2] net/mlx5: check for no data read in devx interrupt
> 
> On 07/02/2026 06:09, Stephen Hemminger wrote:
> > On Fri,  6 Feb 2026 17:20:53 +0000
> > Kevin Traynor <[email protected]> wrote:
> >
> >> A busy-loop may occur when there are EPOLLERR, EPOLLHUP or
> EPOLLRDHUP
> >> epoll events for the devx interrupt fd.
> >>
> >> This may happen if the interrupt fd is deleted, if the device is
> >> unbound from mlx5_core kernel driver or if the device is removed by
> >> the mlx5 kernel driver as part of LAG setup.
> >>
> >> When that occurs, there is no data to be read and in the devx
> >> interrupt handler an EAGAIN is returned on the first call to
> >> devx_get_async_cmd_comp, but this is not checked.
> >>
> >> As the interrupt is not removed or condition reset, it causes an
> >> interrupt processing busy-loop, which leads to the dpdk-intr thread
> >> going to 100% CPU.
> >>
> >> e.g.
> >> epoll_wait
> >>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) =
> >> 1 read(28, 0x7f1f5c7fc2f0, 40)
> >>    = -1 EAGAIN (Resource temporarily unavailable) epoll_wait
> >>    (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) =
> >> 1 read(28, 0x7f1f5c7fc2f0, 40)
> >>    = -1 EAGAIN (Resource temporarily unavailable)
> >>
> >> Add a check for an EAGAIN return from devx_get_async_cmd_comp on the
> >> first read. If that happens, unregister the callback to prevent looping.
> >>
> >> Bugzilla ID: 1873
> >> Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
> >> Cc: [email protected]
> >>
> >> Signed-off-by: Kevin Traynor <[email protected]>
> >
> > AI spotted this, I didn't...
> >
> >
> > Errors:
> >
> >     Line 139: Unnecessary semicolon after closing brace
> >
> > c
> >
> >    };
> >
> > Should be:
> > c
> >
> >    }
> >
> >     Lines 142-146: Block comment uses incorrect style Block comments in C
> code should use /* and */ style, not /** which is reserved for documentation
> comments.
> >
> > c
> >
> >    /**
> >     * no data and EAGAIN indicate there is an error or
> >     * disconnect state. Unregister callback to prevent
> >     * interrupt busy-looping.
> >     */
> >
> > Should be:
> > c
> >
> >    /*
> >     * no data and EAGAIN indicate there is an error or
> >     * disconnect state. Unregister callback to prevent
> >     * interrupt busy-looping.
> >     */
> >
> > Warnings:
> >
> >     Logic clarity: The variable data_read is set to true inside the while 
> > loop but
> never checked when data WAS read. Consider if data_read is the clearest way to
> express this condition.
> >
> 
> Ack above. Thanks.Will be fixed in v3.

Reply via email to