> -----Original Message----- > From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On > Behalf Of Harini Katakam > Sent: Tuesday, March 12, 2019 3:10 AM > To: Paul Thomas <pthomas8...@gmail.com> > Cc: linuxptp-devel@lists.sourceforge.net; net...@vger.kernel.org > Subject: Re: [Linuxptp-devel] strangeness > > Hi Paul, > On Tue, Mar 12, 2019 at 8:26 AM Paul Thomas <pthomas8...@gmail.com> wrote: > > > > Hi All, > > > > Let me do a quick clean recap of this issue. > > > > On a Debian arm64 system with a 5.0rc8 kernel using the macb driver on > > zynqmp, enabling tx timestamping (1) breaks networking! The first and > > most noticeable way is that you can no longer connect with ssh. This > > is a serious bug somewhere and merits some attention. > > > > Trying to debug ssh is a possibility, but I was trying to debug with > > something easier and thus the netcat testing. The specific issue can > > be seen in the following strace. In this setup nc just connects to a > > server and tries to send two packets (2). The first packet goes > > through fine, but the second doesn't because nc is stuck forever > > trying to read from the socket. > > pselect6(4, [0 3], NULL, NULL, NULL, NULL) = 1 (in [0]) <-- waiting on > > stdin and UDP sock > > read(0, "c1\n", 8192) = 3 <-- read three chars from stdin > > write(3, "c1\n", 3) = 3 <-- write those out on the UDP sock > > pselect6(4, [0 3], NULL, NULL, NULL, NULL) = 1 (in [3]) <-- waiting > > on stdin and UDP sock > > read(3, <-- waits forever here as there is no data to read > > > > I've been reading more, an old patch and the timestamping.txt doc > > helped me understand a little more of what's going on: > > https://lore.kernel.org/netdev/20130328211925.7644.15781.stgit@jekeller- > hub.jf.intel.com/ > > https://www.kernel.org/doc/Documentation/networking/timestamping.txt > > > > So it is clear that if the SO_SELECT_ERR_QUEUE flag is set then in > > fact the select should return, but it is not set in this case. I can > > see everything that is going on in datagram_poll() in datagram.c. The > > main difference being that in the broken case the mask is 0x30c and in > > the working case it is 0x304. The difference is EPOLLERR, which is > > there clearly in the code if !skb_queue_empty(&sk->sk_error_queue). > > > > Then in select.c POLLIN_SET includes EPOLLERR. It almost looks as if > > it's behaving as it should (except that things break). My first > > question is should the sk_error_queue be empty if there is a tx > > timestamp available (in datagram_poll() in datagram.c)? If it's not > > empty I don't see what else SO_SELECT_ERR_QUEUE flag is doing for the > > select() and I don't see what would be different about the macb/arm64 > > setup? > > Thanks for the summary. > I think sk_error_queue should be empty because packets are queued to > that via skb_complete_timestamp (sock_queue_err_skb) and this should > not be called in this flow. I'm sorry if I'm missing something - I'll let > others > from netdev comment. > I'm not sure why EPOLLERR in being set in this case. >
I believe at least historically that the Tx timestamps were notified to the applications using the error queue. Thanks, Jake _______________________________________________ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel