http://bugs.dpdk.org/show_bug.cgi?id=1873

            Bug ID: 1873
           Summary: eal interrupt: fd error conditions not handled
           Product: DPDK
           Version: 26.03
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: Normal
         Component: core
          Assignee: [email protected]
          Reporter: [email protected]
  Target Milestone: ---

The DPDK interrupt handling thread enters a busy-loop when epoll error
conditions (EPOLLERR, EPOLLHUP, EPOLLRDHUP) occur on interrupt file
descriptors. This means that the dpdk-intr thread will run at 100% cpu
consumption.

When an interrupt file descriptor enters an error state or experiences a
hangup, the interrupt handling code in eal_intr_process_interrupts() does not
properly detect or handle these conditions. The epoll events EPOLLERR,
EPOLLHUP, and EPOLLRDHUP indicate critical issues with the file descriptor, but
the current code attempts to read from the fd without first checking for these
error conditions.    

This results in:
  1. The interrupt continuing to fire repeatedly
  2. The error condition never being cleared
  3. The interrupt thread entering a busy-loop, consuming 100% CPU on a core 

This is generic issue that could happen to any interrupt read directly in the
eal interrupt code or passed to a registered handler for external processing.

I have reproduced this issue in multiple ways such as unbinding a device,
bonding a device,  manually deleting the fd.

To give an example of a normal use case where this might occur:
- A DPDK application starts, eal init and there are some mlx devices probed.
- These devices are not used by DPDK, but later are used to form a linux bond.
- As part of LAG setup, the mlx5_core kernel driver removes the current devices
and fd's.
- Removal of the interrupt fd triggers a mlx devx interrupt with EPOLLRDHUP in
DPDK.
- This is passed to the mlx devx interrupt handler, which does a read and gets
an EAGAIN
- As the condition is not cleared, the next epoll_wait returns immediately with
the same event and so we loop


This issue can also occur with devices that are used in DPDK, but it would not
be expected to unbind a mlx device from the mlx5_core kernel driver when in
use.

Testpmd has some additional handling of RTE_ETH_EVENT_INTR_RMV which in some
cases may break the busy-loop by detaching the device, but this will not work
in every case and an application should not have to register for this event and
rely on it triggering in order to resolve the issue.

Example to reproduce:

# dpdk-testpmd -l 8,9 -- -i
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
Interactive-mode selected
testpmd: Flow tunnel offload support might be limited or unavailable on port 0
testpmd: Flow tunnel offload support might be limited or unavailable on port 1
testpmd: create a new mbuf pool <mb_pool_0>: n=155456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
testpmd: create a new mbuf pool <mb_pool_1>: n=155456, size=2176, socket=1
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: 04:3F:72:C2:07:B8
Configuring Port 1 (socket 0)
Port 1: 04:3F:72:C2:07:B9
Checking link statuses...
Done

testpmd>

# top -H -p $(pidof dpdk-testpmd)
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
 164848 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:01.08
dpdk-testpmd                                   
 164849 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:00.00
dpdk-intr                                      
 164850 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:00.00
dpdk-mp-msg                                    
 164851 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:00.00
dpdk-worker9                                   
 164852 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:00.00
dpdk-telemet-v2      

testpmd> port stop 0

# top -H -p $(pidof dpdk-testpmd)
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
 164848 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:01.08
dpdk-testpmd                                   
 164849 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:00.00
dpdk-intr                                      
 164850 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:00.00
dpdk-mp-msg                                    
 164851 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:00.00
dpdk-worker9                                   
 164852 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:00.00
dpdk-telemet-v2     

# echo "0000:06:00.0" > /sys/bus/pci/drivers/mlx5_core/unbind

# top -H -p $(pidof dpdk-testpmd)
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
 164849 root      20   0  128.5g 434112  75264 R  99.9   0.2   0:07.87
dpdk-intr                                      
 164848 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:01.08
dpdk-testpmd                                   
 164850 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:00.00
dpdk-mp-msg                                    
 164851 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:00.00
dpdk-worker9                                   
 164852 root      20   0  128.5g 434112  75264 S   0.0   0.2   0:00.00
dpdk-telemet-v2   


# strace -p $(ps -T -p $(pidof dpdk-testpmd) | grep dpdk-intr | awk '{print
$2}')

epoll_wait(6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)            = -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)            = -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)            = -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)            = -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
read(28, 0x7f1f5c7fc2f0, 40)            = -1 EAGAIN (Resource temporarily
unavailable)
epoll_wait(6, [{events=EPOLLIN|EPOLLRDHUP, data={u32=28, u64=28}}], 8, -1) = 1
<snip>

I will send shortly an eal patch that will check for (EPOLLERR, EPOLLHUP,
EPOLLRDHUP) in the fd events and remove any interrupts that have these events,
preventing a busy loop.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to