On 11/8/2021 1:43 PM, Vladimir Oltean wrote:
On Mon, Nov 08, 2021 at 01:28:03PM -0800, Christopher Wingert wrote:

On 11/8/2021 12:38 PM, Vladimir Oltean wrote:
On Mon, Nov 08, 2021 at 12:11:11PM -0800, Christopher Wingert wrote:
Hi,

I am working with a Aquantia AQC 107 ethernet interface.  After the announce
message is sent on FD_GENERAL, a poll() of the the FD_GENERAL descriptor
generates a POLLERR.  I see 3 delay messages go out the interface on
FD_EVENT (previous to the announce message) without issue (no socket error
on read on the FD_EVENT descriptor).

The only difference i see between the two sockets is how the sock_filter is
setup.

I am thinking this is an issue with the Aquantia driver, as the same command
on a Mellanox Connect X5 works fine.

Has anyone seen this issue or have a clue as to where I should start?

Thanks!
Chris


ptp4l command line : ptp4l -i els1 -H -P -2 -m
Kernel is 4.18
I downloaded the latest Atlantic driver from the Marvell website 2.4.14.0
I have upgraded the AQC 107 firmware to 3.1.121
I've no experience with this driver whatsoever, but generally, what
ptp4l receives on the error queue of a socket is a TX timestamp. What is
surprising is that there's a TX timestamp for a general (not event)
message, because ptp4l does not ask these to be timestamped.

Apart from the error messages, does the system otherwise behave ok?

You can try to read from the general message socket into a packet buffer
and hexdump it, put it in tcpdump and see what it is. Then the next step
might be to process its control messages (cmsg), although my first guess
would be that TX timestamping is what's going on.

There are plenty of things that could go wrong in a driver (especially
in one you downloaded from the vendor's website and not from kernel.org).
If you're handy with the source code, you can check what is the
condition based on which this driver offers hardware TX timestamps to
the stack. It should be if skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP
is set for that packet, AND hardware TX timestamping was requested
through HWTSTAMP_TX_ON.
Thank you for the quick response!

This is what the current version from git looks like on the 107 without any
code changes (3 delay requests, 1 announce), this loops indefinitely and
MASTER never gets enabled.
ptp4l[506134.862]: selected /dev/ptp11 as PTP clock
ptp4l[506134.889]: port 1 (els1): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[506134.889]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on
INIT_COMPLETE
ptp4l[506134.889]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on
INIT_COMPLETE
ptp4l[506141.948]: port 1 (els1): LISTENING to MASTER on
ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
ptp4l[506141.948]: selected local clock ac1f6b.fffe.dce92d as best master
ptp4l[506141.948]: port 1 (els1): assuming the grand master role
ptp4l[506141.950]: port 1 (els1): unexpected socket error
ptp4l[506141.950]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED
(FT_UNSPECIFIED)


I changed raw.c function raw_send() to the below code to get the timestamp
on both sockets.
    /*
     * Get the time stamp right away.
     */
    // return event == TRANS_EVENT ? sk_receive(fd, pkt, len, NULL, hwts, 
MSG_ERRQUEUE) : cnt;
    if ( event == TRANS_EVENT ) return sk_receive(fd, pkt, len, NULL, hwts, 
MSG_ERRQUEUE);
    if ( event == TRANS_GENERAL ) return sk_receive(fd, pkt, len, NULL, hwts, 
MSG_ERRQUEUE);
    return cnt;

This is the result.
ptp4l[506201.215]: selected /dev/ptp11 as PTP clock
ptp4l[506201.245]: port 1 (els1): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[506201.245]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on 
INIT_COMPLETE
ptp4l[506201.245]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on 
INIT_COMPLETE
ptp4l[506208.757]: port 1 (els1): LISTENING to MASTER on 
ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES
ptp4l[506208.757]: selected local clock ac1f6b.fffe.dce92d as best master
ptp4l[506208.757]: port 1 (els1): assuming the grand master role
ptp4l[506208.759]: poll for tx timestamp woke up on non ERR event
ptp4l[506208.759]: port 1 (els1): send announce failed
ptp4l[506208.759]: port 1 (els1): MASTER to FAULTY on FAULT_DETECTED
(FT_UNSPECIFIED)

Unless there is something wrong in my code change, it doesn't seem to be a
timestamp.

Are you saying that every POLLERR should be combined with a message in the
Error Queue?
It's still implausible that it's not a timestamp (and I don't know what
it can be if that's not it). "man poll" only says:

        POLLERR
               Error condition (only returned in revents; ignored in
               events).  This bit is also set for a file descriptor
               referring to the write end of a pipe when the read end has
               been closed.

and since ptp4l does not open connection-oriented sockets for general
PTP messages, I don't think it can detect that the read end has been
closed.

What seems to be more likely to be going on is that you haven't made all
changes necessary for reading TX timestamps from the error queue of the
general socket. Have you called sk_timestamping_init?

        flags = 1;
        if (setsockopt(fd, SOL_SOCKET, SO_SELECT_ERR_QUEUE,
                       &flags, sizeof(flags)) < 0) {
                pr_warning("%s: SO_SELECT_ERR_QUEUE: %m", device);
                sk_events = 0;
                sk_revents = POLLERR;
        }

introduced by this kernel commit:

commit 7d4c04fc170087119727119074e72445f2bb192b
Author: Keller, Jacob E <jacob.e.kel...@intel.com>
Date:   Thu Mar 28 11:19:25 2013 +0000

     net: add option to enable error queue packets waking select

     Currently, when a socket receives something on the error queue it only 
wakes up
     the socket on select if it is in the "read" list, that is the socket has
     something to read. It is useful also to wake the socket if it is in the 
error
     list, which would enable software to wait on error queue packets without 
waking
     up for regular data on the socket. The main use case is for receiving
     timestamped transmit packets which return the timestamp to the socket via 
the
     error queue. This enables an application to select on the socket for the 
error
     queue only instead of for the regular traffic.

     -v2-
     * Added the SO_SELECT_ERR_QUEUE socket option to every architechture 
specific file
     * Modified every socket poll function that checks error queue

     Signed-off-by: Jacob Keller <jacob.e.kel...@intel.com>
     Cc: Jeffrey Kirsher <jeffrey.t.kirs...@intel.com>
     Cc: Richard Cochran <richardcoch...@gmail.com>
     Cc: Matthew Vick <matthew.v...@intel.com>
     Signed-off-by: David S. Miller <da...@davemloft.net>

So you effectively cannot call poll() or select() on the error queue of
a socket without enabling this option. Also, I think the sk_receive()
function messes up quite badly, because of this incosistent mode in
which it's operating. See, it looks at this global variable called
sk_revents to figure out which events is poll() supposed to return. But
the code was written assuming that there's a single socket on which you
will poll for TX timestamps. And you have two, and configured
differently, at that: on one you call sk_timestamping_init() and on the
other you don't (or at least you don't mention that you do).

Again, thank you for the quick response.  I owe you a beer.

Adding the call to sk_timestamping_init() in raw_open() for FD_GENERAL allowed ptp to come online to a steady state and assume a MASTER role.

I am guessing the AQC 107 driver is getting confused on which socket actually wants timestamps.

Thanks again!
Chris





_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to