Re: [FFmpeg-devel] [PATCH] aarch64: Use cntvct_el0 as timer register on Android

Martin Storsjö Fri, 14 Jun 2024 03:58:24 -0700

On Fri, 14 Jun 2024, Zhao Zhili wrote:

On Jun 13, 2024, at 20:54, Martin Storsjö <[email protected]> wrote:

On Fri, 7 Jun 2024, Martin Storsjö wrote:

The default timer register pmccntr_el0 usually requires enabling
access with e.g. a kernel module.
---
cntvct_el0 has significantly better resolution than
av_gettime_relative (while the unscaled nanosecond output of
clock_gettime is much higher resolution).

In one tested case, the cntvct_el0 timer has a frequency of 25 MHz
(readable via the register cntfrq_el0).
---
libavutil/aarch64/timer.h | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/libavutil/aarch64/timer.h b/libavutil/aarch64/timer.h
index fadc9568f8..966f17081a 100644
--- a/libavutil/aarch64/timer.h
+++ b/libavutil/aarch64/timer.h
@@ -33,7 +33,16 @@ static inline uint64_t read_time(void)
   uint64_t cycle_counter;
   __asm__ volatile(
       "isb                   \t\n"
+#if defined(__ANDROID__)
+        // cntvct_el0 has lower resolution than pmccntr_el0, but is usually
+        // accessible from user space by default.
+        "mrs %0, cntvct_el0        "
+#else
+        // pmccntr_el0 has higher resolution, but is usually not accessible
+        // from user space by default (but access can be enabled with a custom
+        // kernel module).
       "mrs %0, pmccntr_el0       "
+#endif
       : "=r"(cycle_counter) :: "memory" );


Zhao, does this implementation seem useful to you? Does it give you better 
(more accurate, less noisy?) benchmarking numbers on Android, than the fallback 
based on clock_gettime?


Hi Martin, this works on Android and macOS both, so maybe you can enable it for 
macOS too.

I have compared the result of this implementation andmach_absolute_time, this looks like the implementation has smallervariable Deviation than mach_absolute_time. I guess the result is thesame when compared to clock_gettime.

Right, it does seem to use the same scale as mach_absolute_time - but itprobably has less overhead when we can fetch it by just reading aregister, instead of calling out to a system function.

So then I guess I could extend this patch to enable it fordefined(__APPLE__) too.

We have linux perf on Android, and kperf on macOS. Linux perf has thebenefit to reduce interference from other processes on statisticalresults, if I understand correctly.

Yes, possibly, but on the other hand, it also has a bit more noise andoverhead over just using pmccntr_el0; if e.g. tuning and comparing smalldifferences in functions, pmccntr_el0 usually gives the best result.

But anyway, as those are configurable, users building with linux perf willget that, and users disabling it will get the more accurate registerinstead.

I’m not sure about the benefit of macOS kperf.

macOS kperf gives the best and most accurate numbers you can get, on thatHW, but unfortunately, it's undocumented and unofficial (and requiresrunning with sudo). It does give numbers comparable to linux perf, Ithink, i.e. proper clock cycle level numbers.


// Martin
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] aarch64: Use cntvct_el0 as timer register on Android

Reply via email to