> -----Original Message-----
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
> Behalf Of Harini Katakam
> Sent: Tuesday, March 12, 2019 3:10 AM
> To: Paul Thomas <pthomas8...@gmail.com>
> Cc: linuxptp-devel@lists.sourceforge.net; net...@vger.kernel.org
> Subject: Re: [Linuxptp-devel] strangeness
> 
> Hi Paul,
> On Tue, Mar 12, 2019 at 8:26 AM Paul Thomas <pthomas8...@gmail.com> wrote:
> >
> > Hi All,
> >
> > Let me do a quick clean recap of this issue.
> >
> > On a Debian arm64 system with a 5.0rc8 kernel using the macb driver on
> > zynqmp, enabling tx timestamping (1) breaks networking! The first and
> > most noticeable way is that you can no longer connect with ssh. This
> > is a serious bug somewhere and merits some attention.
> >
> > Trying to debug ssh is a possibility, but I was trying to debug with
> > something easier and thus the netcat testing. The specific issue can
> > be seen in the following strace. In this setup nc just connects to a
> > server and tries to send two packets (2). The first packet goes
> > through fine, but the second doesn't because nc is stuck forever
> > trying to read from the socket.
> > pselect6(4, [0 3], NULL, NULL, NULL, NULL) = 1 (in [0]) <-- waiting on
> > stdin and UDP sock
> > read(0, "c1\n", 8192) = 3 <-- read three chars from stdin
> > write(3, "c1\n", 3) = 3 <-- write those out on the UDP sock
> > pselect6(4, [0 3], NULL, NULL, NULL, NULL) = 1 (in [3])  <-- waiting
> > on stdin and UDP sock
> > read(3, <-- waits forever here as there is no data to read
> >
> > I've been reading more, an old patch and the timestamping.txt doc
> > helped me understand a little more of what's going on:
> > https://lore.kernel.org/netdev/20130328211925.7644.15781.stgit@jekeller-
> hub.jf.intel.com/
> > https://www.kernel.org/doc/Documentation/networking/timestamping.txt
> >
> > So it is clear that if the SO_SELECT_ERR_QUEUE flag is set then in
> > fact the select should return, but it is not set in this case. I can
> > see everything that is going on in datagram_poll() in datagram.c. The
> > main difference being that in the broken case the mask is 0x30c and in
> > the working case it is 0x304. The difference is EPOLLERR, which is
> > there clearly in the code if !skb_queue_empty(&sk->sk_error_queue).
> >
> > Then in select.c POLLIN_SET includes EPOLLERR. It almost looks as if
> > it's behaving as it should (except that things break). My first
> > question is should the sk_error_queue be empty if there is a tx
> > timestamp available (in datagram_poll() in datagram.c)? If it's not
> > empty I don't see what else SO_SELECT_ERR_QUEUE flag is doing for the
> > select() and I don't see what would be different about the macb/arm64
> > setup?
> 
> Thanks for the summary.
> I think sk_error_queue should be empty because packets are queued to
> that via skb_complete_timestamp (sock_queue_err_skb) and this should
> not be called in this flow. I'm sorry if I'm missing something - I'll let 
> others
> from netdev comment.
> I'm not sure why EPOLLERR in being set in this case.
> 

I believe at least historically that the Tx timestamps were notified to the 
applications using the error queue.

Thanks,
Jake


_______________________________________________
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to