On Fri, Oct 15, 2021 at 12:01:24AM +0000, brian.hutchin...@l3harris.com wrote:
> > > > If this is a "stack" issue, what can I do to reduce the "message rate"
> > > > or "grant duration" if these are related to whatever a "stack" issue
> > > > is?
> > >
> > > I'd be willing to put my money on a driver bug. But for that you'd
> > > need to confirm that the issue reproduces with the default.cfg and not
> > > just with the
> > > G.8275.2 profile. Don't try to run before you can walk.
>
> So I ran tests using a plain 1588 profile and E2E and yes the problem still 
> happens.  Here is that config:

There's something that just doesn't compute for me.
In those patches, Christian wrote:

        /* Currently, only P2P delay measurement is supported.  Setting ocmode
         * to slave will work independently of actually being master or slave.
         * For E2E delay measurement, switching between master and slave would
         * be required, as the KSZ devices filters out PTP messages depending on
         * the ocmode setting:
         * - in slave mode, DelayReq messages are filtered out
         * - in master mode, Sync messages are filtered out
         * Currently (and probably also in future) there is no interface in the
         * kernel which allows switching between master and slave mode.  For
         * this reason, E2E cannot be supported. See patchwork for full
         * discussion:
         * 
https://patchwork.ozlabs.org/project/netdev/patch/20201019172435.4416-8-cegg...@arri.de/
         */
        ksz9477_ptp_tcmode_set(dev, KSZ9477_PTP_TCMODE_P2P);
        ksz9477_ptp_ocmode_set(dev, KSZ9477_PTP_OCMODE_SLAVE);

Did you modify the driver's OCMODE? I am super confused as to which
packets ptp4l is actually waiting for a TX timestamp for. Because if
you're using E2E and not P2P, then the entire ksz9477_port_deferred_xmit()
is just dead code, is it not?

> [global]
> #
> # Default Data Set

(summary of your changes)

twoStepFlag: 1 to 0
slaveOnly: 0 to 1
clockClass: 248 to 6
fault_reset_interval: 4 to -128
tx_timestamp_timeout: 10 to 1000
unicast_listen: 0 to 1
unicast_req_duration: 3600 to 300
summary_interval: 0 to 4
time_stamping: hardware to p2p1step
tsproc_mode: filter to raw_weight

Can you just print the packet in ptp4l? You're using the default.cfg
settings otherwise, so the UDPv4 network_transport, so:

static int udp_send(struct transport *t, struct fdarray *fda,
                    enum transport_event event, int peer, void *buf, int len,
                    struct address *addr, struct hw_timestamp *hwts)
...

        cnt = sendto(fd, buf, len, 0, &addr->sa, sizeof(addr->sin));
        if (cnt < 1) {
                pr_err("sendto failed: %m");
                return -errno;
        }
        /*
         * Get the time stamp right away.
         */
        return event == TRANS_EVENT ? sk_receive(fd, junk, len, NULL, hwts, 
MSG_ERRQUEUE) : cnt;
                                      ^
                                      you can print the buf here if
                                      sk_receive returns negative

The only place I find where this makes sense to be called from is:
port_delay_request:
        if (port_prepare_and_send(p, msg, TRANS_EVENT)) {

But that further suggests that you've modified the driver, because:

/* Defer transmit if waiting for egress time stamp is required.  */
static struct sk_buff *ksz9477_defer_xmit(struct dsa_port *dp,
                                          struct sk_buff *skb)
{
        /* Use cached PTP msg type from ksz9477_ptp_port_txtstamp().  */
        ptp_msg_type = KSZ9477_SKB_CB(clone)->ptp_msg_type;
        if (ptp_msg_type != PTP_MSGTYPE_PDELAY_REQ)
                goto out_free_clone;  /* only PDelay_Req is deferred */

So could you share the exact list of changes you've made to the patches
from the form that they were posted in?

>
> And I did find a bug in the DSA driver but it didn't appear to change 
> anything.
>
> In ksz9477_ptp_txtstamp_skb function the "ret" that is being assigned
> by "wait_for_completion_timeout" returning is declared as an "int"
> instead of an "unsigned long" so I fixed that.

Doesn't really make a difference on a 64-bit machine.
Nonetheless, is that the sticking point? Do you see this error message
in dmesg when user space loses the TX timestamp?

                dev_err(dev->dev, "timeout waiting for time stamp\n");

> ... still looking for other stuff but again, I'm probably not
> experienced enough (yet) with DSA and LinuxPTP to do much good.


_______________________________________________
Linuxptp-users mailing list
Linuxptp-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-users

Reply via email to