Hi, Willem de Bruijn <[email protected]> writes:
>> Another interesting fact is that if the POLLIN event is added to the >> poll() .events, poll() no longer becomes stuck, > > The process has registered interest only in POLLIN, which the call to > sk_data_read (sock_def_readable) will trigger. > >> and more interestingly >> the returned event in .revents is only POLLERR. > > datagram_poll will set (E)POLLERR based on non-empty sk_error_queue. > >> After a few debugging sessions, we got to 'sock_queue_err_skb()' and >> how it notifies applications of the error just enqueued. Changing it >> to use 'sk->sk_error_report()', fixes the issue for hardware and >> software timestamping. That is patch (2). >> >> The "solution" proposed in patch (2) looks like too big a hammer, > > It looks fine to me. POLLERR is returned regardless of the mask a > process sets up in pollfd.events. So waking with sk_error_report > will fix this while still waking callers waiting on POLLIN. > > Note that on sock_dequeue_err_skb, if another notification (of the > right kind) is waiting, sk_error_report is already called instead of > sk_data_ready. Thank you all who did confirm that this was the right fix. > > This should perhaps go to net, instead of net-next (though not the > test). Will propose it to net, what I am thinking is that now that a few TSN features are upstream, applications that use hardware timestamping (the easiest way to trigger this bug) could be more common. > > If resending, a small nit in the test: please keep the alphabetical > order in getopt. The filepath also looks a bit fishy, but git am applied > the mbox from patchwork without issue. Will send a second version. Thanks, -- Vinicius
