On Tue, Mar 13, 2018 at 4:35 PM, Vinicius Costa Gomes
<vinicius.go...@intel.com> wrote:
> Hi,
>
> Changes from the RFC:
>  - tweaked commit messages;
>
> Original cover letter:
>
> This is actually a "bug report"-RFC instead of the more usual "new
> feature"-RFC.
>
> We are developing an application that uses TX hardware timestamping to
> make some measurements, and during development Randy Witt initially
> reported that the application poll() never unblocked when TX hardware
> timestamping was enabled.
>
> After some investigation, it turned out the problem wasn't only
> exclusive to hardware timestamping, and could be reproduced with
> software timestamping.
>
> Applying patch (1), and running txtimestamp like this, for example:
>
> $ ./txtimestamp -u -4 192.168.1.71 -c 1000 -D -l 1000 -F
>
> ('-u' to use UDP only, '-4' for ipv4 only, '-c 1000' to send 1000
> packets for each test, '-D' to remove the delay between packets, '-l
> 1000' to set the payload to 1000 bytes, '-F' for configuring poll() to
> wait forever)
>
> will cause the application to become stuck in the poll() call in most
> of the times. (Note: I couldn't reproduce the issue running against an
> address that is routed through loopback.)
>
> Another interesting fact is that if the POLLIN event is added to the
> poll() .events, poll() no longer becomes stuck,

The process has registered interest only in POLLIN, which the call to
sk_data_read (sock_def_readable) will trigger.

> and more interestingly
> the returned event in .revents is only POLLERR.

datagram_poll will set (E)POLLERR based on non-empty sk_error_queue.

> After a few debugging sessions, we got to 'sock_queue_err_skb()' and
> how it notifies applications of the error just enqueued. Changing it
> to use 'sk->sk_error_report()', fixes the issue for hardware and
> software timestamping. That is patch (2).
>
> The "solution" proposed in patch (2) looks like too big a hammer,

It looks fine to me. POLLERR is returned regardless of the mask a
process sets up in pollfd.events. So waking with sk_error_report
will fix this while still waking callers waiting on POLLIN.

Note that on sock_dequeue_err_skb, if another notification (of the
right kind) is waiting, sk_error_report is already called instead of
sk_data_ready.

This should perhaps go to net, instead of net-next (though not the test).

If resending, a small nit in the test: please keep the alphabetical
order in getopt. The filepath also looks a bit fishy, but git am applied
the mbox from patchwork without issue.

Reply via email to