On Thu, 29 Jan 2026 10:34:34 +0200
Shani Peretz <[email protected]> wrote:

> During cleanup, a race condition existed:
> 
>   Main Thread:                    Event Dispatch Thread:
>   1. Remove fds from fdset        while (1) {
>   2. Close file descriptors           epoll_wait() [gets interrupted]
>   3. rte_eal_cleanup()                [continues loop]
>   4. Unmap hugepages                  Accesses fdset...   CRASH
>                                   }
> 
> There was no explicit cleanup of the fdset structure.
> The fdset structure is allocated with rte_zmalloc() and the memory would
> only be reclaimed at application shutdown when rte_eal_cleanup() is called,
> which invokes rte_eal_memory_detach() to unmap all the hugepage memory.
> Meanwhile, the event dispatch thread could still be running and accessing
> the fdset.
> 
> The code had a `destroy` flag that the event dispatch thread checked,
> but it was never set during cleanup, and the code never waited for
> the thread to actually exit before freeing memory.
> 
> To fix this, the commit implements fdset_destroy() that sets the destroy
> flag with mutex protection, waits for thread termination, and cleans up
> all resources including the fdset memory.
> 
> Update socket.c to call fdset_destroy() when the last vhost-user socket
> is unregistered.
> 
> Fixes: 0e38b42bf61c ("vhost: manage FD with epoll")
> Cc: [email protected]
> 
> Signed-off-by: Shani Peretz <[email protected]>

It is preferable not to use posix mutex in DPDK code.
Can this be done with regular locks or better yet stdatomic instead.

Reply via email to