On 6/15/2020 8:23 AM, Vladimir Oltean wrote:
> In the current design of the SO_SELECT_ERR_QUEUE socket option (which is
> enabled by sk_timestamping_init on the event fd), it is a bug to only
> check revents & POLLIN, but not also POLLERR.
>
> Normally the error queue events that the application expects (i.e. TX
> timestamps) are received, within a given timeout, in-band with the
> transmission of the timestampable message itself (for example in
> raw_send).
>
> For messages that the application does not / no longer expects, such as
> TX timestamps delivered late, duplicate TX timestamps, general
> exceptional messages enqueued by the kernel in the socket error queue
> etc, ptp4l will be taken by surprise in clock_poll() by these, and will
> think that there is data, since POLLIN is set (but in fact in the socket
> error queue etc, ptp4l will be taken by surprise in clock_poll() by
> these, and will think that there is data, since POLLIN is set (but in
> fact POLLERR is also set, and this has an entirely different meaning).
>
> A very, very simple reproducer is to take a DSA switch and run:
>
> tcpdump -i eth0 -j adapter_unsynced
>
> on its DSA master net device. The above command will enable timestamping
> on that net device, and if both the DSA switch and the master support
> PTP, this will make the kernel send duplicate TX timestamps for every
> sent event packet, which will completely kill ptp4l until a reboot, with
> no indication whatsoever of what's going on.
>
> Since the messages on the error queue are unexpected, we have no need
> for them. And they can be in theory anything, so simply hex dumping
> their content and moving along sounds like a good idea.
>
> Printing them to the user is optional (and helpful), but reading them is
> not. With this patch, even with extraneous data delivered by a buggy
> kernel (which the application now loudly complains about), the
> synchronization keeps chugging along. Otherwise the application starts
> reordering packets in recvmsg() due to misinterpreting which socket
> queue has data available.
>
> Signed-off-by: Vladimir Oltean <olte...@gmail.com>
> ---
> Changes in v2:
> Make msg_raw_dump print to stderr instead of stdout, matching the
> "Unexpected data on socket err queue:" text.
>
> clock.c | 11 +++++++++++
> msg.c | 12 ++++++++++++
> msg.h | 8 ++++++++
> 3 files changed, 31 insertions(+)
>
> diff --git a/clock.c b/clock.c
> index f43cc2afd49e..d61881c5db7a 100644
> --- a/clock.c
> +++ b/clock.c
> @@ -1559,6 +1559,17 @@ int clock_poll(struct clock *c)
> LIST_FOREACH(p, &c->ports, list) {
> /* Let the ports handle their events. */
> for (i = 0; i < N_POLLFD; i++) {
> + if (cur[i].revents & POLLERR) {
> + unsigned char pkt[1600];
> +
> + cnt = recv(cur[i].fd, pkt, sizeof(pkt),
> + MSG_ERRQUEUE);
> + pr_err("Unexpected data on socket err queue:");
> + msg_raw_dump(pkt, cnt, stderr);
> +
> + continue;
> + }
> +
> if (cur[i].revents & (POLLIN|POLLPRI)) {
> event = port_event(p, i);
> if (EV_STATE_DECISION_EVENT == event) {
> diff --git a/msg.c b/msg.c
> index d1619d4973f1..59bb55d6cd89 100644
> --- a/msg.c
> +++ b/msg.c
> @@ -601,3 +601,15 @@ int msg_sots_missing(struct ptp_message *m)
> }
> return msg_sots_valid(m) ? 0 : 1;
> }
> +
> +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp)
> +{
> + int k;
> +
> + for (k = 0; k < cnt; k++) {
> + if (k % 16 == 0)
> + fprintf(fp, "\n%04x ", k);
> + fprintf(fp, "%02x ", pkt[k]);
> + }
> + fprintf(fp, "\n");
> +}
Could this use the pr facility, so that it honors the printing options
to log to syslog and manages the level?
I'm not sure what I would mark these as: debug because they no longer
impact the syncing process, or true error because they are unexpected?
Thanks,
Jake
> diff --git a/msg.h b/msg.h
> index e71d3ceec4b9..f2165084395a 100644
> --- a/msg.h
> +++ b/msg.h
> @@ -395,6 +395,14 @@ void msg_put(struct ptp_message *m);
> */
> int msg_sots_missing(struct ptp_message *m);
>
> +/**
> + * Print a wireshark-compatible hex dump of a message buffer.
> + * @param pkt Message buffer to print
> + * @param cnt Length of message buffer
> + * @param fp An open file pointer.
> + */
> +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp);
> +
> /**
> * Test whether a message has a valid SO_TIMESTAMPING time stamp.
> * @param m Message to test.
>
_______________________________________________
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel