On Wed, Mar 14, 2018 at 12:32 PM Willem de Bruijn < willemdebruijn.ker...@gmail.com> wrote:
> On Tue, Mar 13, 2018 at 4:35 PM, Vinicius Costa Gomes > <vinicius.go...@intel.com> wrote: > > Hi, > > > > Changes from the RFC: > > - tweaked commit messages; > > > > Original cover letter: > > > > This is actually a "bug report"-RFC instead of the more usual "new > > feature"-RFC. > > > > We are developing an application that uses TX hardware timestamping to > > make some measurements, and during development Randy Witt initially > > reported that the application poll() never unblocked when TX hardware > > timestamping was enabled. > > > > After some investigation, it turned out the problem wasn't only > > exclusive to hardware timestamping, and could be reproduced with > > software timestamping. > > > > Applying patch (1), and running txtimestamp like this, for example: > > > > $ ./txtimestamp -u -4 192.168.1.71 -c 1000 -D -l 1000 -F > > > > ('-u' to use UDP only, '-4' for ipv4 only, '-c 1000' to send 1000 > > packets for each test, '-D' to remove the delay between packets, '-l > > 1000' to set the payload to 1000 bytes, '-F' for configuring poll() to > > wait forever) > > > > will cause the application to become stuck in the poll() call in most > > of the times. (Note: I couldn't reproduce the issue running against an > > address that is routed through loopback.) > > > > Another interesting fact is that if the POLLIN event is added to the > > poll() .events, poll() no longer becomes stuck, > The process has registered interest only in POLLIN, which the call to > sk_data_read (sock_def_readable) will trigger. > > and more interestingly > > the returned event in .revents is only POLLERR. > datagram_poll will set (E)POLLERR based on non-empty sk_error_queue. > > After a few debugging sessions, we got to 'sock_queue_err_skb()' and > > how it notifies applications of the error just enqueued. Changing it > > to use 'sk->sk_error_report()', fixes the issue for hardware and > > software timestamping. That is patch (2). > > > > The "solution" proposed in patch (2) looks like too big a hammer, > It looks fine to me. POLLERR is returned regardless of the mask a > process sets up in pollfd.events. So waking with sk_error_report > will fix this while still waking callers waiting on POLLIN. > Note that on sock_dequeue_err_skb, if another notification (of the > right kind) is waiting, sk_error_report is already called instead of > sk_data_ready. Thank you for the fix. It looks fine to me too, because we already only arm POLLERR for sock_dequeue_err_skb, and POLLERR is always returned when error queue is not empty: if (...skb_queue_empty(&sk->sk_error_queue)) mask |= POLLERR ... > This should perhaps go to net, instead of net-next (though not the test). +1 I think the fix should go to net. > If resending, a small nit in the test: please keep the alphabetical > order in getopt. The filepath also looks a bit fishy, but git am applied > the mbox from patchwork without issue.