> -----Original Message----- > From: Richard Cochran [mailto:[email protected]] > Sent: Thursday, April 27, 2017 11:25 AM > To: David Mirabito <[email protected]> > Cc: [email protected] > Subject: Re: [Linuxptp-devel] Workaround for 'timed out while polling for tx > timestamp' on IGB > > On Thu, Apr 27, 2017 at 03:41:03PM +1000, David Mirabito wrote: > > * "Fixing" (if this is indeed a bug) was reasonably straight forward - more > > or less reordering steps 4,5,6 so that we wake the app only *after* we've > > unlocked the bit. > > If your analysis was correct, then yes, indeed this is a driver bug. > Please submit a patch on netdev. >
Can I clarify here, David, you're suggesting that you instead clear the bitlock earlier before you call the timestamp function? I think that's probably a good thing and minimizing the time that we hold a lock is good. I suspect that most of the Intel drivers are at fault here and can make some patches for them. Or you can if you wish. There *is* a fundamental limit that the hardware assumes that only one transmit timestamp request at a time, so we can't actually do any better. But we can unlock as soon as we read the timestamp registers, which should help this race. Thanks, Jake > > Q0: Did I make any immediately bad assumptions in my quest to try stop > > these tx_timeouts? > > Hard to tell, but your explanation seed reasonable to me. > > > Q1: Is there any way for drivers to pass up a 'non-timestamp' to indicate > > to applications that it's never going to come? > > No. > > > I know there are cases where the packet may be dropped beyond driver's > > control so the timestamp won't arrive, but does it make sense for drivers > > to indicate to applications that it knows the timestamp will never come, > > particularly if the packet was sent? > > I can't imagine that a driver would know this. > > > Q2: Could it be within ptp4l's capabilities to detect such a 'no-timestamp > > possible' message on the errqueue and do something not quite so drastic as > > a full reset, especially if it's a transient -EAGAIN type response? > > If you miss a Tx time stamp, then something is wrong. Probably the > link is down, but it hard to reliably know the cause. I am skeptical > that this can really be improved in a practical way. > > Really, we should fix the drivers, as you have done, or choose > non-broken HW. > > [ BTW, if you don't like the long fault interval, just use ASAP. ] > > > This is not there today, but would it be sensible/allowable to try again a > > few times, with different sequence numbers, etc? Even if not the PTP > > protocol should survive a the occasional missing packets, without a full > > reset, just maybe the delay value gets a little out of date or whatnot, no? > > I could imagine an option allowing the program to ignore a certain > number of missed Tx time stamps before throwing the fault. > > > Q3: This all assumes "well behaved" apps that send one packet, receiving > > one timestamp before attempting to send another timestamped packet. Is this > > mandated, or could an app reasonably expect to send a few packets in a row? > > Many current HW designs do not support this. > > > Sending a second packet will compete against the driver retrieving the > > timestamp of the first, with no feedback to the app whether it won or not > > and whether a timestamp may be expected. Does the API allow for more fancy > > HW with deeper tx-timestamp queues to be fully utilised? > > The API allows fully asynchronous Tx time stamping. In theory, you > could send a packet, remember that it deserves a time stamp, then go > on to other things. Polling on the error queue would allow you to > later match CMSGs with the remembered transmitted packets. > > We don't do that way because 1) this complicates the code for dubious > benefit* and 2) that would limit the HW you could use. > > * The only benefit I can see would be when sending messages at a very > high rate. So far, I have yet to hear that anyone has run into this > limitation. > > > Q4: Is there some conceptual difference between "Packet was dropped > > therefore no timestamp" and "Packet [maybe?] sent; wasn't able to get a TX > > timestamp for it"? > > Well, there is a difference, but the poor application will never know > about it. > > Thanks, > Richard > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Linuxptp-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/linuxptp-devel ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Linuxptp-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/linuxptp-devel
