Hi Wojciech

cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc acpi_pm 


Switching clocksource to acpi_pm resolved the clock frequency issue

cat /sys/devices/system/clocksource/clocksource0/current_clocksource
acpi_pm
chronyc tracking 
Reference ID    : AC1D0218 (bkc2)
Stratum         : 4
Ref time (UTC)  : Fri May 11 11:23:28 2018
System time     : 0.000000000 seconds fast of NTP time
Last offset     : +0.000005277 seconds
RMS offset      : 0.000004877 seconds
Frequency       : 36.614 ppm fast
Residual freq   : +0.048 ppm
Skew            : 3.755 ppm
Root delay      : 0.193692103 seconds
Root dispersion : 0.015338245 seconds
Update interval : 2.0 seconds
Leap status     : Normal


However it increased the cost of clock_gettime(CLOCK_REALTIME, ..); 
function call to 1.5 micros which used to be just 20 nanos with tsc.  
clock_gettime(CLOCK_REALTIME, ..) is a very critical function and it needs 
to be most optimised. So 1.5 micros cant work.


On some further investigation with my team we observed that the TSC 
frequency on server is higher than promised by the CPU provider.

vendor_id    : GenuineIntel
cpu family    : 6
model        : 85
model name    : Intel(R) Xeon(R) Gold 6144 CPU @ 3.50GHz

We calculated the observed TSC ticks by averaging the tsc ticks (rdtsc) 
over a period of 10 seconds. (see code attached)
TSC ticks per sec = 3596432020 (which is exactly the drift I reported in my 
starting post) .

26814.393 ppm ~ 1/40
(1-3500/3596) ~ 1/40

This is an overclocked server and this overshooting of TSC ticks is making 
the system clock gain time I guess .... 


On Friday, May 11, 2018 at 2:32:02 PM UTC+5:30, Himanshu Sharma wrote:
>
> Hi guys
>
>
> We recenty got new Supermicros SYS-1029UX-LL1-S16 20 core servers for 
> testing. We are running it with RedHat 7.4. In our testing we have observed 
> that the system clock on these servers is running way too fast.  
>
> Evidence of observation
>
> 1. Chrony
>
> We sync our servers using chronyd and the frequency for this one is 
> something we havent seen anywhere. I am not sure that clock can be slewed 
> to accommodate this much frequency offset. (SYS-1029UX-LL1-S16 is gaining 
> 26 millisecond every second).
>
> > chronyc tracking 
> Reference ID    : xxxxxxxx
> Stratum         : 4
> Ref time (UTC)  : Fri May 11 08:48:20 2018
> System time     : 0.000001912 seconds fast of NTP time
> Last offset     : +0.000001952 seconds
> RMS offset      : 0.000001457 seconds
> Frequency       : 26814.393 ppm fast
> Residual freq   : +0.134 ppm
> Skew            : 0.124 ppm
> Root delay      : 0.194362879 seconds
> Root dispersion : 0.000912781 seconds
> Update interval : 4.0 seconds
> Leap status     : Normal
>
> It is taking forever to adjust clock rate to accomodate this much 
> frequency difference.
>
>
> 2. PTPD
>
> We also tried running ptpd to sync time with max_offset_ppm = 1000 to 
> adjust for maximum possible frequency difference, but this was also futile. 
>
> > /var/run/ptpd2.event.log
>
> 2018-05-11 14:21:21.249612 ptpd2[113396].enp94s0 (info)      (slv) 
> TimingService.PTP0: acquired clock control
> *2018-05-11 14:21:49.976752 ptpd2[113396].enp94s0 (critical)  (slv) Offset 
> above 1 second (1.001336627 s). Clock will step.*
> 2018-05-11 14:21:48.975450 ptpd2[113396].enp94s0 (warning)   (slv) Stepped 
> the system clock to: 05/11/18 14:21:48.975425504
> 2018-05-11 14:21:49.169540 ptpd2[113396].enp94s0 (notice)    (lstn_reset) 
> Now in state: PTP_LISTENING
> 2018-05-11 14:21:51.022561 ptpd2[113396].enp94s0 (info)      (lstn_reset) 
> UTC offset is now 37
> 2018-05-11 14:21:51.022639 ptpd2[113396].enp94s0 (info)      (lstn_reset) 
> New best master selected: 88f031fffec32ec1(10.30.128.1)/0
> 2018-05-11 14:21:51.022788 ptpd2[113396].enp94s0 (notice)    (slv) Now in 
> state: PTP_SLAVE, Best master: 88f031fffec32ec1(10.30.128.1)/0 
> (IPv4:172.27.210.123)
> 2018-05-11 14:21:51.024175 ptpd2[113396].enp94s0 (notice)    (slv) 
> Received first Sync from Master
> 2018-05-11 14:21:52.057232 ptpd2[113396].enp94s0 (notice)    (slv) 
> Received first Delay Response from Master
> 2018-05-11 14:21:52.057246 ptpd2[113396].enp94s0 (notice)    (slv) 
> Received new Delay Request interval 1 from Master (was: 0)
> 2018-05-11 14:21:59.257989 ptpd2[113396].enp94s0 (notice)    (slv) Servo: 
> Going to slew the clock with the maximum frequency adjustment
> *2018-05-11 14:22:28.029476 ptpd2[113396].enp94s0 (critical)  (slv) Offset 
> above 1 second (1.011173248 s). Clock will step.*
> 2018-05-11 14:22:27.018337 ptpd2[113396].enp94s0 (warning)   (slv) Stepped 
> the system clock to: 05/11/18 14:22:27.18314450
> 2018-05-11 14:22:27.196373 ptpd2[113396].enp94s0 (notice)    (lstn_reset) 
> Now in state: PTP_LISTENING
> 2018-05-11 14:22:29.056742 ptpd2[113396].enp94s0 (info)      (lstn_reset) 
> UTC offset is now 37
> 2018-05-11 14:22:29.056797 ptpd2[113396].enp94s0 (info)      (lstn_reset) 
> New best master selected: 88f031fffec32ec1(10.30.128.1)/0
> 2018-05-11 14:22:29.056843 ptpd2[113396].enp94s0 (notice)    (slv) Now in 
> state: PTP_SLAVE, Best master: 88f031fffec32ec1(10.30.128.1)/0 
> (IPv4:172.27.210.123)
> 2018-05-11 14:22:29.063858 ptpd2[113396].enp94s0 (notice)    (slv) 
> Received first Sync from Master
> 2018-05-11 14:22:30.066198 ptpd2[113396].enp94s0 (notice)    (slv) 
> Received first Delay Response from Master
> 2018-05-11 14:22:30.066215 ptpd2[113396].enp94s0 (notice)    (slv) 
> Received new Delay Request interval 1 from Master (was: 0)
>
> Approximately every 40 seconds the clock has to be adjusted back 1 second 
> to accomodate for the time gained which cannot be offset by the frequency 
> adjustment. 
>
>
>
> There is clearly nothing we can do using only frequency adjustment to 
> accommodate a clock frequency this high. We also dont want the clock to go 
> back in time because it can affect some of the mathematical calculations 
> based on time difference which are written (not by me) on the premise that 
> time moves only forward (which indeed it should :)).
>
> Would really appreciate any solutions to rectify this issue. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.
#include <cstdio>
#include <stdint.h>
#include <sys/time.h>
static uint64_t
rdtsc(void)
{
unsigned long msw;
unsigned long lsw;
__asm__ __volatile__("rdtsc" : "=a" (lsw), "=d" (msw) :);
return (((uint64_t) msw << 32) | lsw);
}
int
main(int argc, char *argv[])
{
uint64_t start;
uint64_t end;
struct timeval tv_start;
struct timeval tv_end;
double sec;
gettimeofday(&tv_start, NULL);
start = rdtsc();
while (1) {
gettimeofday(&tv_end, NULL);
if (tv_end.tv_sec > tv_start.tv_sec + 10)
break;
}
end = rdtsc();
printf("Total TSC ticks = %lu\n", end - start);
sec = (tv_end.tv_sec - tv_start.tv_sec) +
(double) (tv_end.tv_usec - tv_start.tv_usec) / 1000000;
printf("Time = %.3lf sec\n", sec);
printf("TSC ticks per sec = %.0lf\n", (double) (end - start) / sec);
}

Reply via email to