On Fri, Jul 18, 2025 at 10:59:48AM +0200, Nam Cao wrote: > On Fri, Jul 18, 2025 at 09:38:27AM +0100, Soheil Hassas Yeganeh wrote: > > On Fri, Jul 18, 2025 at 8:52 AM Nam Cao <[email protected]> wrote: > > > > > > ep_events_available() checks for available events by looking at > > > ep->rdllist > > > and ep->ovflist. However, this is done without a lock, therefore the > > > returned value is not reliable. Because it is possible that both checks on > > > ep->rdllist and ep->ovflist are false while ep_start_scan() or > > > ep_done_scan() is being executed on other CPUs, despite events are > > > available. > > > > > > This bug can be observed by: > > > > > > 1. Create an eventpoll with at least one ready level-triggered event > > > > > > 2. Create multiple threads who do epoll_wait() with zero timeout. The > > > threads do not consume the events, therefore all epoll_wait() should > > > return at least one event. > > > > > > If one thread is executing ep_events_available() while another thread is > > > executing ep_start_scan() or ep_done_scan(), epoll_wait() may wrongly > > > return no event for the former thread. > > > > That is the whole point of epoll_wait with a zero timeout. We would want to > > opportunistically poll without much overhead, which will have more > > false positives. > > A caller that calls with a zero timeout should retry later, and will > > at some point observe the event. > > Is this a documented behavior that users expect? I do not see this in the > man page.
The selftests rely on this behavior that timeout=0 sees events from a concurrently running producer. They would fail at a very higher rate after this change - believe me I had a similar patch that changed something in this area. I would explore the seqcount that Mateusz suggested tbh.

