Hi Jacob,

On Thu, 18 Jun 2020 at 22:44, Jacob Keller <jacob.e.kel...@intel.com> wrote:
>
>
>
> On 6/15/2020 8:23 AM, Vladimir Oltean wrote:
> > In the current design of the SO_SELECT_ERR_QUEUE socket option (which is
> > enabled by sk_timestamping_init on the event fd), it is a bug to only
> > check revents & POLLIN, but not also POLLERR.
> >
> > Normally the error queue events that the application expects (i.e. TX
> > timestamps) are received, within a given timeout, in-band with the
> > transmission of the timestampable message itself (for example in
> > raw_send).
> >
> > For messages that the application does not / no longer expects, such as
> > TX timestamps delivered late, duplicate TX timestamps, general
> > exceptional messages enqueued by the kernel in the socket error queue
> > etc, ptp4l will be taken by surprise in clock_poll() by these, and will
> > think that there is data, since POLLIN is set (but in fact in the socket
> > error queue etc, ptp4l will be taken by surprise in clock_poll() by
> > these, and will think that there is data, since POLLIN is set (but in
> > fact POLLERR is also set, and this has an entirely different meaning).
> >
> > A very, very simple reproducer is to take a DSA switch and run:
> >
> > tcpdump -i eth0 -j adapter_unsynced
> >
> > on its DSA master net device. The above command will enable timestamping
> > on that net device, and if both the DSA switch and the master support
> > PTP, this will make the kernel send duplicate TX timestamps for every
> > sent event packet, which will completely kill ptp4l until a reboot, with
> > no indication whatsoever of what's going on.
> >
> > Since the messages on the error queue are unexpected, we have no need
> > for them. And they can be in theory anything, so simply hex dumping
> > their content and moving along sounds like a good idea.
> >
> > Printing them to the user is optional (and helpful), but reading them is
> > not. With this patch, even with extraneous data delivered by a buggy
> > kernel (which the application now loudly complains about), the
> > synchronization keeps chugging along. Otherwise the application starts
> > reordering packets in recvmsg() due to misinterpreting which socket
> > queue has data available.
> >
> > Signed-off-by: Vladimir Oltean <olte...@gmail.com>
> > ---
> > Changes in v2:
> > Make msg_raw_dump print to stderr instead of stdout, matching the
> > "Unexpected data on socket err queue:" text.
> >
> >  clock.c | 11 +++++++++++
> >  msg.c   | 12 ++++++++++++
> >  msg.h   |  8 ++++++++
> >  3 files changed, 31 insertions(+)
> >
> > diff --git a/clock.c b/clock.c
> > index f43cc2afd49e..d61881c5db7a 100644
> > --- a/clock.c
> > +++ b/clock.c
> > @@ -1559,6 +1559,17 @@ int clock_poll(struct clock *c)
> >       LIST_FOREACH(p, &c->ports, list) {
> >               /* Let the ports handle their events. */
> >               for (i = 0; i < N_POLLFD; i++) {
> > +                     if (cur[i].revents & POLLERR) {
> > +                             unsigned char pkt[1600];
> > +
> > +                             cnt = recv(cur[i].fd, pkt, sizeof(pkt),
> > +                                        MSG_ERRQUEUE);
> > +                             pr_err("Unexpected data on socket err 
> > queue:");
> > +                             msg_raw_dump(pkt, cnt, stderr);
> > +
> > +                             continue;
> > +                     }
> > +
> >                       if (cur[i].revents & (POLLIN|POLLPRI)) {
> >                               event = port_event(p, i);
> >                               if (EV_STATE_DECISION_EVENT == event) {
> > diff --git a/msg.c b/msg.c
> > index d1619d4973f1..59bb55d6cd89 100644
> > --- a/msg.c
> > +++ b/msg.c
> > @@ -601,3 +601,15 @@ int msg_sots_missing(struct ptp_message *m)
> >       }
> >       return msg_sots_valid(m) ? 0 : 1;
> >  }
> > +
> > +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp)
> > +{
> > +     int k;
> > +
> > +     for (k = 0; k < cnt; k++) {
> > +             if (k % 16 == 0)
> > +                     fprintf(fp, "\n%04x ", k);
> > +             fprintf(fp, "%02x ", pkt[k]);
> > +     }
> > +     fprintf(fp, "\n");
> > +}
>
> Could this use the pr facility, so that it honors the printing options
> to log to syslog and manages the level?
>
> I'm not sure what I would mark these as: debug because they no longer
> impact the syncing process, or true error because they are unexpected?
>
> Thanks,
> Jake
>

Thanks a lot for looking at this.
Good point about redirecting to syslog, I did not think about that.
The reason why I did not use pr_err was due to the automatic line
ending, which would have unnecessarily complicated this function by
having me print each line of the packet to a temporary buffer first.
But now that I see there's no way around it, I guess I'll have to do
that.

> > diff --git a/msg.h b/msg.h
> > index e71d3ceec4b9..f2165084395a 100644
> > --- a/msg.h
> > +++ b/msg.h
> > @@ -395,6 +395,14 @@ void msg_put(struct ptp_message *m);
> >   */
> >  int msg_sots_missing(struct ptp_message *m);
> >
> > +/**
> > + * Print a wireshark-compatible hex dump of a message buffer.
> > + * @param pkt  Message buffer to print
> > + * @param cnt  Length of message buffer
> > + * @param fp   An open file pointer.
> > + */
> > +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp);
> > +
> >  /**
> >   * Test whether a message has a valid SO_TIMESTAMPING time stamp.
> >   * @param m  Message to test.
> >

Cheers,
-Vladimir


_______________________________________________
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to