> > > This function (cpu_global_time()) is called only when we have first
> checked that TSC is invariant. Also we measure the TSC frequency in that
> case. This function is defined in the same file as cpu_cycles(), and the
> file is x86 specific. So, we know what we are doing, and just re-using the
> code to read TSC.
> 
> What sort of timing accuracy is expected from the app?
> 
> From benchmarking the maximum single-threaded rate of these reads:
> 
>  x86_64:
> 
>    read       7 ns/op
>    read_sync  22 ns/op
> 
>  A57:
> 
>    read       4 ns/op
>    read_sync  26 ns/op
> 
> read_sync issues a synchronizing instruction for greater timing accuracy
> but clearly takes more time to return the time value read from the core.

Accuracy is as good as implementation can offer with reasonable overhead. We do 
not put any nsec figures into API spec. ODP API should offer application the 
most efficient way to read time anyway.

This patch does not take a position which way TSC should be read. There are 
three options: rdtsc, rdtsc + barrier, rdtscp. I think the current code is good 
enough for the accuracy. Barrier adds slight overhead. Rdtscp is not as widely 
supported as rdtsc. This detail is a magnitude less significant compared to: 
use system call vs direct TSC read. It can be tuned later. This patch set helps 
if rdtscp should be used later on (introduces x86 cpu flags).

-Petri

Reply via email to