On 6/15/2020 8:23 AM, Vladimir Oltean wrote:
> In the current design of the SO_SELECT_ERR_QUEUE socket option (which is
> enabled by sk_timestamping_init on the event fd), it is a bug to only
> check revents & POLLIN, but not also POLLERR.
> 
> Normally the error queue events that the application expects (i.e. TX
> timestamps) are received, within a given timeout, in-band with the
> transmission of the timestampable message itself (for example in
> raw_send).
> 
> For messages that the application does not / no longer expects, such as
> TX timestamps delivered late, duplicate TX timestamps, general
> exceptional messages enqueued by the kernel in the socket error queue
> etc, ptp4l will be taken by surprise in clock_poll() by these, and will
> think that there is data, since POLLIN is set (but in fact in the socket
> error queue etc, ptp4l will be taken by surprise in clock_poll() by
> these, and will think that there is data, since POLLIN is set (but in
> fact POLLERR is also set, and this has an entirely different meaning).
> 
> A very, very simple reproducer is to take a DSA switch and run:
> 
> tcpdump -i eth0 -j adapter_unsynced
> 
> on its DSA master net device. The above command will enable timestamping
> on that net device, and if both the DSA switch and the master support
> PTP, this will make the kernel send duplicate TX timestamps for every
> sent event packet, which will completely kill ptp4l until a reboot, with
> no indication whatsoever of what's going on.
> 
> Since the messages on the error queue are unexpected, we have no need
> for them. And they can be in theory anything, so simply hex dumping
> their content and moving along sounds like a good idea.
> 
> Printing them to the user is optional (and helpful), but reading them is
> not. With this patch, even with extraneous data delivered by a buggy
> kernel (which the application now loudly complains about), the
> synchronization keeps chugging along. Otherwise the application starts
> reordering packets in recvmsg() due to misinterpreting which socket
> queue has data available.
> 
> Signed-off-by: Vladimir Oltean <olte...@gmail.com>
> ---
> Changes in v2:
> Make msg_raw_dump print to stderr instead of stdout, matching the
> "Unexpected data on socket err queue:" text.
> 
>  clock.c | 11 +++++++++++
>  msg.c   | 12 ++++++++++++
>  msg.h   |  8 ++++++++
>  3 files changed, 31 insertions(+)
> 
> diff --git a/clock.c b/clock.c
> index f43cc2afd49e..d61881c5db7a 100644
> --- a/clock.c
> +++ b/clock.c
> @@ -1559,6 +1559,17 @@ int clock_poll(struct clock *c)
>       LIST_FOREACH(p, &c->ports, list) {
>               /* Let the ports handle their events. */
>               for (i = 0; i < N_POLLFD; i++) {
> +                     if (cur[i].revents & POLLERR) {
> +                             unsigned char pkt[1600];
> +
> +                             cnt = recv(cur[i].fd, pkt, sizeof(pkt),
> +                                        MSG_ERRQUEUE);
> +                             pr_err("Unexpected data on socket err queue:");
> +                             msg_raw_dump(pkt, cnt, stderr);
> +
> +                             continue;
> +                     }
> +
>                       if (cur[i].revents & (POLLIN|POLLPRI)) {
>                               event = port_event(p, i);
>                               if (EV_STATE_DECISION_EVENT == event) {
> diff --git a/msg.c b/msg.c
> index d1619d4973f1..59bb55d6cd89 100644
> --- a/msg.c
> +++ b/msg.c
> @@ -601,3 +601,15 @@ int msg_sots_missing(struct ptp_message *m)
>       }
>       return msg_sots_valid(m) ? 0 : 1;
>  }
> +
> +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp)
> +{
> +     int k;
> +
> +     for (k = 0; k < cnt; k++) {
> +             if (k % 16 == 0)
> +                     fprintf(fp, "\n%04x ", k);
> +             fprintf(fp, "%02x ", pkt[k]);
> +     }
> +     fprintf(fp, "\n");
> +}

Could this use the pr facility, so that it honors the printing options
to log to syslog and manages the level?

I'm not sure what I would mark these as: debug because they no longer
impact the syncing process, or true error because they are unexpected?

Thanks,
Jake

> diff --git a/msg.h b/msg.h
> index e71d3ceec4b9..f2165084395a 100644
> --- a/msg.h
> +++ b/msg.h
> @@ -395,6 +395,14 @@ void msg_put(struct ptp_message *m);
>   */
>  int msg_sots_missing(struct ptp_message *m);
>  
> +/**
> + * Print a wireshark-compatible hex dump of a message buffer.
> + * @param pkt  Message buffer to print
> + * @param cnt  Length of message buffer
> + * @param fp   An open file pointer.
> + */
> +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp);
> +
>  /**
>   * Test whether a message has a valid SO_TIMESTAMPING time stamp.
>   * @param m  Message to test.
> 


_______________________________________________
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to