On Mon, Nov 08, 2021 at 11:43:55PM +0200, Vladimir Oltean wrote:
> On Mon, Nov 08, 2021 at 01:28:03PM -0800, Christopher Wingert wrote:
> > 
> > 
> > On 11/8/2021 12:38 PM, Vladimir Oltean wrote:
> > > On Mon, Nov 08, 2021 at 12:11:11PM -0800, Christopher Wingert wrote:
> > > > Hi,
> > > > 
> > > > I am working with a Aquantia AQC 107 ethernet interface.  After the 
> > > > announce
> > > > message is sent on FD_GENERAL, a poll() of the the FD_GENERAL descriptor
> > > > generates a POLLERR.  I see 3 delay messages go out the interface on
> > > > FD_EVENT (previous to the announce message) without issue (no socket 
> > > > error
> > > > on read on the FD_EVENT descriptor).
> > > > 
> > > > The only difference i see between the two sockets is how the 
> > > > sock_filter is
> > > > setup.
> > > > 
> > > > I am thinking this is an issue with the Aquantia driver, as the same 
> > > > command
> > > > on a Mellanox Connect X5 works fine.
> > > > 
> > > > Has anyone seen this issue or have a clue as to where I should start?
> > > > 
> > > > Thanks!
> > > > Chris
> > > > 
> > > > 
> > > > ptp4l command line : ptp4l -i els1 -H -P -2 -m
> > > > Kernel is 4.18
> > > > I downloaded the latest Atlantic driver from the Marvell website 
> > > > 2.4.14.0
> > > > I have upgraded the AQC 107 firmware to 3.1.121
> > > I've no experience with this driver whatsoever, but generally, what
> > > ptp4l receives on the error queue of a socket is a TX timestamp. What is
> > > surprising is that there's a TX timestamp for a general (not event)
> > > message, because ptp4l does not ask these to be timestamped.
> > > 
> > > Apart from the error messages, does the system otherwise behave ok?
> > > 
> > > You can try to read from the general message socket into a packet buffer
> > > and hexdump it, put it in tcpdump and see what it is. Then the next step
> > > might be to process its control messages (cmsg), although my first guess
> > > would be that TX timestamping is what's going on.
> > > 
> > > There are plenty of things that could go wrong in a driver (especially
> > > in one you downloaded from the vendor's website and not from kernel.org).
> > > If you're handy with the source code, you can check what is the
> > > condition based on which this driver offers hardware TX timestamps to
> > > the stack. It should be if skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP
> > > is set for that packet, AND hardware TX timestamping was requested
> > > through HWTSTAMP_TX_ON.
> > 
> > Thank you for the quick response!
> > 
> > This is what the current version from git looks like on the 107 without any
> > code changes (3 delay requests, 1 announce), this loops indefinitely and
> > MASTER never gets enabled.
> > ptp4l[506134.862]: selected /dev/ptp11 as PTP clock
> > ptp4l[506134.889]: port 1 (els1): INITIALIZING to LISTENING on INIT_COMPLETE
> > ptp4l[506134.889]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on
> > INIT_COMPLETE
> > ptp4l[506134.889]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on
> > INIT_COMPLETE
> > ptp4l[506141.948]: port 1 (els1): LISTENING to MASTER on
> > ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
> > ptp4l[506141.948]: selected local clock ac1f6b.fffe.dce92d as best master
> > ptp4l[506141.948]: port 1 (els1): assuming the grand master role
> > ptp4l[506141.950]: port 1 (els1): unexpected socket error
> > ptp4l[506141.950]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED
> > (FT_UNSPECIFIED)
> > 
> > 
> > I changed raw.c function raw_send() to the below code to get the timestamp
> > on both sockets.
> >    /*
> >     * Get the time stamp right away.
> >     */
> >    // return event == TRANS_EVENT ? sk_receive(fd, pkt, len, NULL, hwts, 
> > MSG_ERRQUEUE) : cnt;
> >    if ( event == TRANS_EVENT ) return sk_receive(fd, pkt, len, NULL, hwts, 
> > MSG_ERRQUEUE);
> >    if ( event == TRANS_GENERAL ) return sk_receive(fd, pkt, len, NULL, 
> > hwts, MSG_ERRQUEUE);
> >    return cnt;
> > 
> > This is the result.
> > ptp4l[506201.215]: selected /dev/ptp11 as PTP clock
> > ptp4l[506201.245]: port 1 (els1): INITIALIZING to LISTENING on INIT_COMPLETE
> > ptp4l[506201.245]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on 
> > INIT_COMPLETE
> > ptp4l[506201.245]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on 
> > INIT_COMPLETE
> > ptp4l[506208.757]: port 1 (els1): LISTENING to MASTER on 
> > ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
> > ptp4l[506208.757]: selected local clock ac1f6b.fffe.dce92d as best master
> > ptp4l[506208.757]: port 1 (els1): assuming the grand master role
> > ptp4l[506208.759]: poll for tx timestamp woke up on non ERR event
> > ptp4l[506208.759]: port 1 (els1): send announce failed
> > ptp4l[506208.759]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED
> > (FT_UNSPECIFIED)
> > 
> > Unless there is something wrong in my code change, it doesn't seem to be a
> > timestamp.
> > 
> > Are you saying that every POLLERR should be combined with a message in the
> > Error Queue?
> 
> It's still implausible that it's not a timestamp (and I don't know what
> it can be if that's not it). "man poll" only says:
> 
>        POLLERR
>               Error condition (only returned in revents; ignored in
>               events).  This bit is also set for a file descriptor
>               referring to the write end of a pipe when the read end has
>               been closed.
> 
> and since ptp4l does not open connection-oriented sockets for general
> PTP messages, I don't think it can detect that the read end has been
> closed.
> 
> What seems to be more likely to be going on is that you haven't made all
> changes necessary for reading TX timestamps from the error queue of the
> general socket. Have you called sk_timestamping_init?
> 
>       flags = 1;
>       if (setsockopt(fd, SOL_SOCKET, SO_SELECT_ERR_QUEUE,
>                      &flags, sizeof(flags)) < 0) {
>               pr_warning("%s: SO_SELECT_ERR_QUEUE: %m", device);
>               sk_events = 0;
>               sk_revents = POLLERR;
>       }
> 
> introduced by this kernel commit:
> 
> commit 7d4c04fc170087119727119074e72445f2bb192b
> Author: Keller, Jacob E <jacob.e.kel...@intel.com>
> Date:   Thu Mar 28 11:19:25 2013 +0000
> 
>     net: add option to enable error queue packets waking select
> 
>     Currently, when a socket receives something on the error queue it only 
> wakes up
>     the socket on select if it is in the "read" list, that is the socket has
>     something to read. It is useful also to wake the socket if it is in the 
> error
>     list, which would enable software to wait on error queue packets without 
> waking
>     up for regular data on the socket. The main use case is for receiving
>     timestamped transmit packets which return the timestamp to the socket via 
> the
>     error queue. This enables an application to select on the socket for the 
> error
>     queue only instead of for the regular traffic.
> 
>     -v2-
>     * Added the SO_SELECT_ERR_QUEUE socket option to every architechture 
> specific file
>     * Modified every socket poll function that checks error queue
> 
>     Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
>     Cc: Jeffrey Kirsher <jeffrey.t.kirs...@intel.com>
>     Cc: Richard Cochran <richardcoch...@gmail.com>
>     Cc: Matthew Vick <matthew.v...@intel.com>
>     Signed-off-by: David S. Miller <da...@davemloft.net>
> 
> So you effectively cannot call poll() or select() on the error queue of
> a socket without enabling this option. Also, I think the sk_receive()
> function messes up quite badly, because of this incosistent mode in
> which it's operating. See, it looks at this global variable called
> sk_revents to figure out which events is poll() supposed to return. But
> the code was written assuming that there's a single socket on which you
> will poll for TX timestamps. And you have two, and configured
> differently, at that: on one you call sk_timestamping_init() and on the
> other you don't (or at least you don't mention that you do).

I think we are running around in circles. If you call sk_timestamping_init()
you are effectively requesting TX timestamps for general messages,
therefore changing the premise of the issue.

Can you instead remove that check for !(pfd.revents & sk_revents)?


_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to