On Thu, 5 Feb 2026 at 19:35, Yehor Malikov <[email protected]> wrote: > > From: Yehor Malikov <[email protected]> > > The fdset_event_dispatch thread runs in a loop checking the destroy > flag after each epoll_wait iteration. During process exit, > rte_eal_cleanup() releases resources (like hugepages via > rte_eal_memory_detach) while the fdset thread is still running. > This race condition can lead to use-after-free errors if the thread > accesses memory that has been freed. > > Standard destructors (RTE_FINI) run after rte_eal_cleanup() returns, > which is too late to prevent this race. > > To address this, introduce a mechanism to register cleanup callbacks > that run within rte_eal_cleanup() before memory is detached: > 1. Add rte_eal_cleanup_register() API to EAL. > 2. Implement fdset_deinit() in vhost to synchronously stop the > dispatch thread, close the epoll fd, and release resources. > 3. Register the vhost cleanup handler during initialization to > ensure proper shutdown ordering via EAL.
As a first step, I suggest switching to libc allocations instead of DPDK memory for the fdset objects (those are only for control path of the vhost-user library). This should avoid the UAF you noticed and restore the situation to what it was before the vhost library change you pointed out. Now, on this proposal for an EAL change, the fixed size for the callbacks array makes little sense to me: in the case of fdset objects in vhost, we only have two atm but this could increase. On the other hand, this is control path stuff, not performance sensitive. We can do some allocations and go with a list of callbacks. The cleanup callbacks as you propose are really vague. The problem here is linked to control thread objects, so maybe we could add this cleanup notion in the control thread creation itself. This notion has been skipped since the introduction of control threads, maybe it is worth investigating? The semantics must be sorted out, but it seems cleaner to me. -- David Marchand

