I’m trying to write a cross-platform function that gives access to the CPU’s prefetch instructions such as x86 prefetch0/1/2/prefetchnta and AAarch64 too. I’ve found that the GDC and LDC compilers provide builtin magic functions for this, and are what I need. I am trying to put together a plain-English detailed spec for the respective builtin magic functions.

My questions:

Q1) I need to compare the spec for the GCC and LDC builtin magic functions’ "locality" parameter. Can anyone tell me if GDC and LDC have kept mutual compatibility here?

Q2) Could someone help me turn the GCC and LDC specs into english regarding the locality parameter ? - see (2) and (4) below.

Q3) Does the locality parameter determine which _level_ of the data cache hierarchy data is fetched into? Or is it always fetched into L1 data cache and the outer ones, and this parameter affects caches’ _future behaviour_?

Q3) Will these magic builtins work on AAarch64?

Here’s what I’ve found so far

1. GCC builtin published by the D runtime:
 import gcc.simd : prefetch;
                prefetch!( rw, locality )( p );

   2. GCC: builtin_prefetch (const void *addr, ...) ¶
“This function is used to minimize cache-miss latency by moving data into a cache before it is accessed. You can insert calls to __builtin_prefetch into code for which you know addresses of data in memory that is likely to be accessed soon. If the target supports them, data prefetch instructions are generated. If the prefetch is done early enough before the access then the data will be in the cache by the time it is accessed. The value of addr is the address of the memory to prefetch. There are two optional arguments, rw and locality. The value of rw is a compile-time constant one or zero; one means that the prefetch is preparing for a write to the memory address and zero, the default, means that the prefetch is preparing for a read. The value locality must be a compile-time constant integer between zero and three. A value of zero means that the data has no temporal locality, so it need not be left in the cache after the access. A value of three means that the data has a high degree of temporal locality and should be left in all levels of cache possible. Values of one and two mean, respectively, a low or moderate degree of temporal locality. The default is three.”

3. declare void @llvm.prefetch(ptr <address>, i32 <rw>, i32 <locality>, i32 <cache type>

4. Regarding llvm.prefetch() I found the following spec:
“rw is the specifier determining if the fetch should be for a read (0) or write (1), and locality is a temporal locality specifier ranging from (0) - no locality, to (3) - extremely local keep in cache. The cache type specifies whether the prefetch is performed on the data (1) or instruction (0) cache. The rw, locality and cache type arguments must be constant integers.”

5. I also found this snippet https://dlang.org/phobos/core_builtins.html - which is great for the syntax of the call to the LDC builtin, but the call for GDC is no good as it lacks the parameters that I want. This D runtime routine might benefit from accepting all the parameters that GCC’s prefetch builtin takes.

Many thanks in advance.

Reply via email to