On 8/20/2025 12:56 AM, Miroslav Lichvar wrote: > On Tue, Aug 19, 2025 at 04:31:49PM -0700, Jacob Keller wrote: >> I'm having trouble interpreting what exactly this data shows, as its >> quite a lot of data and numbers. I guess that it is showing when it >> switches over to software timestamps.. It would be nice if ntpperf >> showed number of events which were software vs hardware timestamping, as >> thats likely the culprit. igb hardare only has a single outstanding Tx >> timestamp at a time. > > The server doesn't have a way to tell the client (ntpperf) which > timestamps are HW or SW, we can only guess from the measured offset as > HW timestamps should be more accurate, but on the server side the > number of SW and HW TX timestamps provided to the client can be > monitored with the "chronyc serverstats" command. The server requests > both SW and HW TX timestamps and uses the better one it gets from the > kernel, if it can actually get one before it receives the next > request from the same client (ntpperf simulates up to 16384 concurrent > clients). > > When I run ntpperf at a fixed rate of 140000 requests per second > for 10 seconds (-r 140000 -t 10), I get the following numbers. > > Without the patch: > NTP daemon TX timestamps : 28056 > NTP kernel TX timestamps : 1012864 > NTP hardware TX timestamps : 387239 > > With the patch: > NTP daemon TX timestamps : 28047 > NTP kernel TX timestamps : 707674 > NTP hardware TX timestamps : 692326 > > The number of HW timestamps is significantly higher with the patch, so > that looks good. > > But when I increase the rate to 200000, I get this: > > Without the patch: > NTP daemon TX timestamps : 35835 > NTP kernel TX timestamps : 1410956 > NTP hardware TX timestamps : 581575 > > With the patch: > NTP daemon TX timestamps : 476908 > NTP kernel TX timestamps : 646146 > NTP hardware TX timestamps : 412095 >
When does the NTP daemon decide to go with timestamping within the daemon vs timestamping in the kernel? It seems odd that we don't achieve 100% kernel timestamps... > With the patch, the server is now dropping requests and can provide > a smaller number of HW timestamps and also a smaller number of SW > timestamps, i.e. less work is done overall. > > Could the explanation be that a single CPU core now needs to do more > work, while it was better distributed before? > Hm. The interrupt vector may be fired on the same CPU maybe? The work items can go into the general pool which spreads to all CPUs, and I guess the amount of work to submit the timestamp is high enough that we do end up costing too much? Hmm. We could experiment with using a kthread via the ptp_aux_work setup and tuning to ensure that thread has good prioritization? I don't know what the best compromise is since its clear the interrupt is best if the timestamp volume isn't too high.
OpenPGP_signature.asc
Description: OpenPGP digital signature
