Hi Shaopeng,
On 3/2/26 07:26, Shaopeng Tan (Fujitsu) wrote:
> Hello Ben,
>
> Thank you for your reply.
>
> I've made the fixes and re-run the tests on Grace, as you advised.
> I appreciate your feedback.
>
>> This is only guaranteed to clean and invalidate to the point of
>> coherence, PoC. On Grace I expect this is L3/slc and so the cache line
>> there in L3/slc is likely not invalidated or pushed to DRAM.
>> The dsb() for synchronization is missing for aarch64 in sb().
>
> I added dsb() for synchronization for aarch64 as shown below.
>
> @@ -27,6 +30,8 @@ static void sb(void)
> #if defined(__i386) || defined(__x86_64)
> asm volatile("sfence\n\t"
> : : : "memory");
> +#elif defined(__aarch64__)
> + __asm__ __volatile__("dsb sy\n\t" ::: "memory");
> #endif
> }
Sorry, if I wasn't clear. The dsb() is required for the synchronization
of the clean and invalidate operation but the clean and invalidate
operation has no requirement to clean and invalidate the L3/slc and as
that's the PoC and so probably just does the clean and invalidate up
to L2.
>
>> IIUC the L3 cache is in the nvidia interconnect and so changing the
>> cache portion bitmap would correlate with events from the nvidia
>> interconnect pmu. However, I don't think you are using events from the
>> interconnect.
>
> I used the NVIDIA event "nvidia_scf_pmu/scf_cache_refill/".
>
> After the above fixes, the running results are as follows:
> $ sudo ./resctrl_tests -t cat
> TAP version 13
> # Pass: Check kernel supports resctrl filesystem
> # Pass: Check resctrl mountpoint "/sys/fs/resctrl" exists
> # resctrl filesystem not mounted
> 1..3
> # Starting L3_CAT test ...
> # Mounting resctrl to "/sys/fs/resctrl"
> # Cache size :119537664
> # Writing benchmark parameters to resctrl FS
> # Write schema "L3:1=fc0" to resctrl FS
> # Write schema "L3:1=3f" to resctrl FS
> # Write schema "L3:1=fe0" to resctrl FS
> # Write schema "L3:1=1f" to resctrl FS
> # Write schema "L3:1=ff0" to resctrl FS
> # Write schema "L3:1=f" to resctrl FS
> # Write schema "L3:1=ff8" to resctrl FS
> # Write schema "L3:1=7" to resctrl FS
> # Write schema "L3:1=ffc" to resctrl FS
> # Write schema "L3:1=3" to resctrl FS
> # Write schema "L3:1=ffe" to resctrl FS
> # Write schema "L3:1=1" to resctrl FS
> # Checking for pass/fail
> # Number of bits: 6
> # Average LLC val: 0
> # Cache span (lines): 933888
> # Number of bits: 5
> # Average LLC val: 0
> # Cache span (lines): 778240
> # Number of bits: 4
> # Average LLC val: 0
> # Cache span (lines): 622592
> # Number of bits: 3
> # Average LLC val: 0
> # Cache span (lines): 466944
> # Number of bits: 2
> # Average LLC val: 0
> # Cache span (lines): 311296
> # Number of bits: 1
> # Average LLC val: 0
> # Cache span (lines): 155648
> ok 1 L3_CAT: test
>
> The result of the nvidia_scf_pmu/scf_cache_refill event is 0.
> I have tried various changes to the perf_event_open() parameters, such as
> type, read_format, PID etc..
> Although non-zero results were obtained for some parameter combinations, the
> expected results were not achieved in any scenario.
Could this be because the clean and invalidate doesn't affect the slc/L3?
> Are there any special specifications needed for the perf_event_open()
> parameters for Grace or Arm architecture?
I'm not sure.
>
> The perf_event_open() parameters used when collecting the above results are
> as follows:
> perf_event_open({type=PERF_TYPE_RAW, size=0x88 /* PERF_ATTR_SIZE_??? */,
> config=0xf1, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER,
> read_format=PERF_FORMAT_GROUP, disabled=1, inherit=1, exclude_kernel=1,
> exclude_hv=1, precise_ip=0 /* arbitrary skid */, exclude_guest=1,
> exclude_callchain_kernel=1, ...}, 68508, 1, -1, PERF_FLAG_FD_CLOEXEC) = 3
> Could you please give us your opinion?
>
> Also, since this kselftest is for all Arm chips, we need an event common to
> all chips.
> Do you have any ideas on what event we should collect?
I don't think there is any common event. Perhaps you could make the
event to test against an input to the test?
>
> Best regards,
> Shaopeng TAN
Thanks,
Ben