A busy-loop may occur when there are disconnect/error events
such as EPOLLERR, EPOLLHUP or EPOLLRDHUP on Linux for the devx
interrupt fd.

This may happen if the interrupt fd is deleted, if the device
is unbound from mlx5_core kernel driver or if the device is
removed by the mlx5 kernel driver as part of LAG setup.

As the interrupt is not removed or condition reset, it causes
an interrupt processing busy-loop, which leads to the dpdk-intr
thread going to 100% CPU.

e.g.
epoll_wait
   (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)
   = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait
   (6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)
   = -1 EAGAIN (Resource temporarily unavailable)

In order to prevent a busy-loop use the eal API
rte_intr_active_events_flags() to get the interrupt events and check
for disconnect/error.

If there is a disconnect/error event, unregister the devx callback.

Bugzilla ID: 1873
Fixes: f15db67df09c ("net/mlx5: accelerate DV flow counter query")
Cc: [email protected]

Signed-off-by: Kevin Traynor <[email protected]>
Acked-by: Stephen Hemminger <[email protected]>
Acked-by: Viacheslav Ovsiienko <[email protected]>
---
 drivers/net/mlx5/linux/mlx5_ethdev_os.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c 
b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index 18819a4a0f..4bbc590e91 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -860,4 +860,24 @@ mlx5_dev_interrupt_handler_devx(void *cb_arg)
        } out;
        uint8_t *buf = out.buf + sizeof(out.cmd_resp);
+       uint32_t events = rte_intr_active_events_flags();
+
+       if (events & (RTE_INTR_EVENT_HUP | RTE_INTR_EVENT_RDHUP | 
RTE_INTR_EVENT_ERR)) {
+               /*
+                * Disconnect or Error event that cannot be cleared by reading.
+                * Unregister callback to prevent interrupt busy-looping.
+                */
+               DRV_LOG(WARNING, "disconnect or error event for mlx5 devx 
interrupt on fd %d"
+                       " (events=0x%x)",
+                       rte_intr_fd_get(sh->intr_handle_devx), events);
+
+               if (rte_intr_callback_unregister_pending(sh->intr_handle_devx,
+                                                        
mlx5_dev_interrupt_handler_devx,
+                                                        (void *)sh, NULL) < 0) 
{
+                       DRV_LOG(WARNING,
+                               "unable to unregister mlx5 devx interrupt 
callback on fd %d",
+                               rte_intr_fd_get(sh->intr_handle_devx));
+               }
+               return;
+       }
 
        while (!mlx5_glue->devx_get_async_cmd_comp(sh->devx_comp,
-- 
2.53.0

Reply via email to