Hi Jacob, On Thu, 18 Jun 2020 at 22:44, Jacob Keller <jacob.e.kel...@intel.com> wrote: > > > > On 6/15/2020 8:23 AM, Vladimir Oltean wrote: > > In the current design of the SO_SELECT_ERR_QUEUE socket option (which is > > enabled by sk_timestamping_init on the event fd), it is a bug to only > > check revents & POLLIN, but not also POLLERR. > > > > Normally the error queue events that the application expects (i.e. TX > > timestamps) are received, within a given timeout, in-band with the > > transmission of the timestampable message itself (for example in > > raw_send). > > > > For messages that the application does not / no longer expects, such as > > TX timestamps delivered late, duplicate TX timestamps, general > > exceptional messages enqueued by the kernel in the socket error queue > > etc, ptp4l will be taken by surprise in clock_poll() by these, and will > > think that there is data, since POLLIN is set (but in fact in the socket > > error queue etc, ptp4l will be taken by surprise in clock_poll() by > > these, and will think that there is data, since POLLIN is set (but in > > fact POLLERR is also set, and this has an entirely different meaning). > > > > A very, very simple reproducer is to take a DSA switch and run: > > > > tcpdump -i eth0 -j adapter_unsynced > > > > on its DSA master net device. The above command will enable timestamping > > on that net device, and if both the DSA switch and the master support > > PTP, this will make the kernel send duplicate TX timestamps for every > > sent event packet, which will completely kill ptp4l until a reboot, with > > no indication whatsoever of what's going on. > > > > Since the messages on the error queue are unexpected, we have no need > > for them. And they can be in theory anything, so simply hex dumping > > their content and moving along sounds like a good idea. > > > > Printing them to the user is optional (and helpful), but reading them is > > not. With this patch, even with extraneous data delivered by a buggy > > kernel (which the application now loudly complains about), the > > synchronization keeps chugging along. Otherwise the application starts > > reordering packets in recvmsg() due to misinterpreting which socket > > queue has data available. > > > > Signed-off-by: Vladimir Oltean <olte...@gmail.com> > > --- > > Changes in v2: > > Make msg_raw_dump print to stderr instead of stdout, matching the > > "Unexpected data on socket err queue:" text. > > > > clock.c | 11 +++++++++++ > > msg.c | 12 ++++++++++++ > > msg.h | 8 ++++++++ > > 3 files changed, 31 insertions(+) > > > > diff --git a/clock.c b/clock.c > > index f43cc2afd49e..d61881c5db7a 100644 > > --- a/clock.c > > +++ b/clock.c > > @@ -1559,6 +1559,17 @@ int clock_poll(struct clock *c) > > LIST_FOREACH(p, &c->ports, list) { > > /* Let the ports handle their events. */ > > for (i = 0; i < N_POLLFD; i++) { > > + if (cur[i].revents & POLLERR) { > > + unsigned char pkt[1600]; > > + > > + cnt = recv(cur[i].fd, pkt, sizeof(pkt), > > + MSG_ERRQUEUE); > > + pr_err("Unexpected data on socket err > > queue:"); > > + msg_raw_dump(pkt, cnt, stderr); > > + > > + continue; > > + } > > + > > if (cur[i].revents & (POLLIN|POLLPRI)) { > > event = port_event(p, i); > > if (EV_STATE_DECISION_EVENT == event) { > > diff --git a/msg.c b/msg.c > > index d1619d4973f1..59bb55d6cd89 100644 > > --- a/msg.c > > +++ b/msg.c > > @@ -601,3 +601,15 @@ int msg_sots_missing(struct ptp_message *m) > > } > > return msg_sots_valid(m) ? 0 : 1; > > } > > + > > +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp) > > +{ > > + int k; > > + > > + for (k = 0; k < cnt; k++) { > > + if (k % 16 == 0) > > + fprintf(fp, "\n%04x ", k); > > + fprintf(fp, "%02x ", pkt[k]); > > + } > > + fprintf(fp, "\n"); > > +} > > Could this use the pr facility, so that it honors the printing options > to log to syslog and manages the level? > > I'm not sure what I would mark these as: debug because they no longer > impact the syncing process, or true error because they are unexpected? > > Thanks, > Jake >
Thanks a lot for looking at this. Good point about redirecting to syslog, I did not think about that. The reason why I did not use pr_err was due to the automatic line ending, which would have unnecessarily complicated this function by having me print each line of the packet to a temporary buffer first. But now that I see there's no way around it, I guess I'll have to do that. > > diff --git a/msg.h b/msg.h > > index e71d3ceec4b9..f2165084395a 100644 > > --- a/msg.h > > +++ b/msg.h > > @@ -395,6 +395,14 @@ void msg_put(struct ptp_message *m); > > */ > > int msg_sots_missing(struct ptp_message *m); > > > > +/** > > + * Print a wireshark-compatible hex dump of a message buffer. > > + * @param pkt Message buffer to print > > + * @param cnt Length of message buffer > > + * @param fp An open file pointer. > > + */ > > +void msg_raw_dump(unsigned char *pkt, int cnt, FILE *fp); > > + > > /** > > * Test whether a message has a valid SO_TIMESTAMPING time stamp. > > * @param m Message to test. > > Cheers, -Vladimir _______________________________________________ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel