On 04/26 07:30:15, Savolainen, Petri (Nokia - FI/Espoo) wrote:
> 
> > > > This function (cpu_global_time()) is called only when we have first
> > checked that TSC is invariant. Also we measure the TSC frequency in that
> > case. This function is defined in the same file as cpu_cycles(), and the
> > file is x86 specific. So, we know what we are doing, and just re-using the
> > code to read TSC.
> > 
> > What sort of timing accuracy is expected from the app?
> > 
> > From benchmarking the maximum single-threaded rate of these reads:
> > 
> >  x86_64:
> > 
> >    read       7 ns/op
> >    read_sync  22 ns/op
> > 
> >  A57:
> > 
> >    read       4 ns/op
> >    read_sync  26 ns/op
> > 
> > read_sync issues a synchronizing instruction for greater timing accuracy
> > but clearly takes more time to return the time value read from the core.
> 
> Accuracy is as good as implementation can offer with reasonable overhead. We 
> do not put any nsec figures into API spec. ODP API should offer application 
> the most efficient way to read time anyway.

'reasonable' is what we need to define.

Another reason why you're seeing a performance boost on x86 is that when
switching from clock_gettime() to RDTSC, you're no longer issuing a 
synchronizing
instruction (fence). As shown above, this can be a significant factor depending
on how often the time is being sampled.

However, there is a loss in timing accuracy because the load of the value
may not happen at the time it happens in program order. This is why a
synchronizing instruction needs to be used, but it slows down the execution
of the thread on the core...

> This patch does not take a position which way TSC should be read. There are 
> three options: rdtsc, rdtsc + barrier, rdtscp. I think the current code is 
> good enough for the accuracy. Barrier adds slight overhead. Rdtscp is not as 
> widely supported as rdtsc. This detail is a magnitude less significant 
> compared to: use system call vs direct TSC read. It can be tuned later. This 
> patch set helps if rdtscp should be used later on (introduces x86 cpu flags).

So you're saying that you do not need the synchronizing instruction, and the
loss of timing accuracy is OK, right?

> -Petri
> 

Reply via email to