http://bugs.dpdk.org/show_bug.cgi?id=1873
Bug ID: 1873
Summary: eal interrupt: fd error conditions not handled
Product: DPDK
Version: 26.03
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: Normal
Component: core
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: ---
The DPDK interrupt handling thread enters a busy-loop when epoll error
conditions (EPOLLERR, EPOLLHUP, EPOLLRDHUP) occur on interrupt file
descriptors. This means that the dpdk-intr thread will run at 100% cpu
consumption.
When an interrupt file descriptor enters an error state or experiences a
hangup, the interrupt handling code in eal_intr_process_interrupts() does not
properly detect or handle these conditions. The epoll events EPOLLERR,
EPOLLHUP, and EPOLLRDHUP indicate critical issues with the file descriptor, but
the current code attempts to read from the fd without first checking for these
error conditions.
This results in:
1. The interrupt continuing to fire repeatedly
2. The error condition never being cleared
3. The interrupt thread entering a busy-loop, consuming 100% CPU on a core
This is generic issue that could happen to any interrupt read directly in the
eal interrupt code or passed to a registered handler for external processing.
I have reproduced this issue in multiple ways such as unbinding a device,
bonding a device, manually deleting the fd.
To give an example of a normal use case where this might occur:
- A DPDK application starts, eal init and there are some mlx devices probed.
- These devices are not used by DPDK, but later are used to form a linux bond.
- As part of LAG setup, the mlx5_core kernel driver removes the current devices
and fd's.
- Removal of the interrupt fd triggers a mlx devx interrupt with EPOLLRDHUP in
DPDK.
- This is passed to the mlx devx interrupt handler, which does a read and gets
an EAGAIN
- As the condition is not cleared, the next epoll_wait returns immediately with
the same event and so we loop
This issue can also occur with devices that are used in DPDK, but it would not
be expected to unbind a mlx device from the mlx5_core kernel driver when in
use.
Testpmd has some additional handling of RTE_ETH_EVENT_INTR_RMV which in some
cases may break the busy-loop by detaching the device, but this will not work
in every case and an application should not have to register for this event and
rely on it triggering in order to resolve the issue.
Example to reproduce:
# dpdk-testpmd -l 8,9 -- -i
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
Interactive-mode selected
testpmd: Flow tunnel offload support might be limited or unavailable on port 0
testpmd: Flow tunnel offload support might be limited or unavailable on port 1
testpmd: create a new mbuf pool <mb_pool_0>: n=155456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
testpmd: create a new mbuf pool <mb_pool_1>: n=155456, size=2176, socket=1
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: 04:3F:72:C2:07:B8
Configuring Port 1 (socket 0)
Port 1: 04:3F:72:C2:07:B9
Checking link statuses...
Done
testpmd>
# top -H -p $(pidof dpdk-testpmd)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
164848 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:01.08
dpdk-testpmd
164849 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:00.00
dpdk-intr
164850 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:00.00
dpdk-mp-msg
164851 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:00.00
dpdk-worker9
164852 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:00.00
dpdk-telemet-v2
testpmd> port stop 0
# top -H -p $(pidof dpdk-testpmd)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
164848 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:01.08
dpdk-testpmd
164849 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:00.00
dpdk-intr
164850 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:00.00
dpdk-mp-msg
164851 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:00.00
dpdk-worker9
164852 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:00.00
dpdk-telemet-v2
# echo "0000:06:00.0" > /sys/bus/pci/drivers/mlx5_core/unbind
# top -H -p $(pidof dpdk-testpmd)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
164849 root 20 0 128.5g 434112 75264 R 99.9 0.2 0:07.87
dpdk-intr
164848 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:01.08
dpdk-testpmd
164850 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:00.00
dpdk-mp-msg
164851 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:00.00
dpdk-worker9
164852 root 20 0 128.5g 434112 75264 S 0.0 0.2 0:00.00
dpdk-telemet-v2
# strace -p $(ps -T -p $(pidof dpdk-testpmd) | grep dpdk-intr | awk '{print
$2}')
epoll_wait(6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40) = -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40) = -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40) = -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40) = -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40) = -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
<snip>
I will send shortly an eal patch that will check for (EPOLLERR, EPOLLHUP,
EPOLLRDHUP) in the fd events and remove any interrupts that have these events,
preventing a busy loop.
--
You are receiving this mail because:
You are the assignee for the bug.