> -----Original Message-----
> From: Richard Cochran [mailto:[email protected]]
> Sent: Thursday, April 27, 2017 11:25 AM
> To: David Mirabito <[email protected]>
> Cc: [email protected]
> Subject: Re: [Linuxptp-devel] Workaround for 'timed out while polling for tx
> timestamp' on IGB
> 
> On Thu, Apr 27, 2017 at 03:41:03PM +1000, David Mirabito wrote:
> > * "Fixing" (if this is indeed a bug) was reasonably straight forward - more
> > or less reordering steps 4,5,6 so that we wake the app only *after* we've
> > unlocked the bit.
> 
> If your analysis was correct, then yes, indeed this is a driver bug.
> Please submit a patch on netdev.
> 

Can I clarify here, David, you're suggesting that you instead clear the bitlock 
earlier before you call the timestamp function? I think that's probably a good 
thing and minimizing the time that we hold a lock is good. I suspect that most 
of the Intel drivers are at fault here and can make some patches for them. Or 
you can if you wish.

There *is* a fundamental limit that the hardware assumes that only one transmit 
timestamp request at a time, so we can't actually do any better. But we can 
unlock as soon as we read the timestamp registers, which should help this race.

Thanks,
Jake

> > Q0: Did I make any immediately bad assumptions in my quest to try stop
> > these tx_timeouts?
> 
> Hard to tell, but your explanation seed reasonable to me.
> 
> > Q1: Is there any way for drivers to pass up a 'non-timestamp' to indicate
> > to applications that it's never going to come?
> 
> No.
> 
> > I know there are cases where the packet may be dropped beyond driver's
> > control so the timestamp won't arrive, but does it make sense for drivers
> > to indicate to applications that it knows the timestamp will never come,
> > particularly if the packet was sent?
> 
> I can't imagine that a driver would know this.
> 
> > Q2: Could it be within ptp4l's capabilities to detect such a 'no-timestamp
> > possible' message on the errqueue and do something not quite so drastic as
> > a full reset, especially if it's a transient -EAGAIN type response?
> 
> If you miss a Tx time stamp, then something is wrong.  Probably the
> link is down, but it hard to reliably know the cause.  I am skeptical
> that this can really be improved in a practical way.
> 
> Really, we should fix the drivers, as you have done, or choose
> non-broken HW.
> 
> [ BTW, if you don't like the long fault interval, just use ASAP. ]
> 
> > This is not there today, but would it be sensible/allowable to try again a
> > few times, with different sequence numbers, etc? Even if not the PTP
> > protocol should survive a the occasional missing packets, without a full
> > reset, just maybe the delay value gets a little out of date or whatnot, no?
> 
> I could imagine an option allowing the program to ignore a certain
> number of missed Tx time stamps before throwing the fault.
> 
> > Q3: This all assumes "well behaved" apps that send one packet,  receiving
> > one timestamp before attempting to send another timestamped packet. Is this
> > mandated, or could an app reasonably expect to send a few packets in a row?
> 
> Many current HW designs do not support this.
> 
> > Sending a second packet will compete against the driver retrieving the
> > timestamp of the first, with no feedback to the app whether it won or not
> > and whether a timestamp may be expected. Does the API allow for more fancy
> > HW with deeper tx-timestamp queues to be fully utilised?
> 
> The API allows fully asynchronous Tx time stamping.  In theory, you
> could send a packet, remember that it deserves a time stamp, then go
> on to other things.  Polling on the error queue would allow you to
> later match CMSGs with the remembered transmitted packets.
> 
> We don't do that way because 1) this complicates the code for dubious
> benefit* and 2) that would limit the HW you could use.
> 
> * The only benefit I can see would be when sending messages at a very
>   high rate.  So far, I have yet to hear that anyone has run into this
>   limitation.
> 
> > Q4: Is there some conceptual difference between "Packet was dropped
> > therefore no timestamp" and "Packet [maybe?] sent; wasn't able to get a TX
> > timestamp for it"?
> 
> Well, there is a difference, but the poor application will never know
> about it.
> 
> Thanks,
> Richard
> 
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Linuxptp-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/linuxptp-devel
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Linuxptp-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel

Reply via email to