Hi Shivank,
thanks a lot for the comments and findings, I've fixed build and plan to update
the patch set soon.
On 1/9/2024 9:46 AM, Garg, Shivank wrote:
> Hi Artem,
>
> I hope this message finds you well.
> I've encountered a compilation issue when KERNEL_REPLICATION is disabled in
> the config.
>
> ld: vmlinux.o: in function `alloc_insn_page':
> /home/amd/linux_mainline/arch/x86/kernel/kprobes/core.c:425: undefined
> reference to `numa_set_memory_rox'
> ld: vmlinux.o: in function `alloc_new_pack':
> /home/amd/linux_mainline/kernel/bpf/core.c:873: undefined reference to
> `numa_set_memory_rox'
> ld: vmlinux.o: in function `bpf_prog_pack_alloc':
> /home/amd/linux_mainline/kernel/bpf/core.c:891: undefined reference to
> `numa_set_memory_rox'
> ld: vmlinux.o: in function `bpf_trampoline_update':
> /home/amd/linux_mainline/kernel/bpf/trampoline.c:447: undefined reference to
> `numa_set_memory_rox'
> ld: vmlinux.o: in function `bpf_struct_ops_map_update_elem':
> /home/amd/linux_mainline/kernel/bpf/bpf_struct_ops.c:515: undefined reference
> to `numa_set_memory_rox'
> ld: vmlinux.o:/home/amd/linux_mainline/kernel/bpf/bpf_struct_ops.c:524: more
> undefined references to `numa_set_memory_rox' follow
>
>
> After some investigation, I've put together a patch that resolves this
> compilation issues for me.
>
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2268,6 +2268,15 @@ int numa_set_memory_nonglobal(unsigned long addr, int
> numpages)
>
> return ret;
> }
> +
> +#else
> +
> +int numa_set_memory_rox(unsigned long addr, int numpages)
> +{
> + return set_memory_rox(addr, numpages);
> +
> +}
> +
> #endif
>
> Additionally, I'm interested in evaluating the performance impact of this
> patchset on AMD processors.
> Could you please point me the benchmarks that you have used in cover letter?
>
> Best Regards,
> Shivank
>
Regarding the benchmarks, we used self-implemented test with system calls load
for now.
We used RedHawk Linux approach as a reference.
The "An Overview of Kernel Text Page Replication in RedHawk™ Linux® 6.3"
article was used.
https://concurrent-rt.com/wp-content/uploads/2020/12/kernel-page-replication.pdf
The test is very simple:
All measured system calls have been invoked using syscall wrapper from glibc,
e.g.
#include <sys/syscall.h> /* Definition of SYS_* constants */
#include <unistd.h>
long syscall(long number, ...);
fork/1
Time measurements include only one time of invoking this system call.
Measurements are made between entering
and exiting the system call.
fork/1024
The system call is invoked in a loop 1024 times. The time between entering
a loop and exiting it was measured.
mmap/munmap
A set of 1024 pages (if PAGE_SIZE is not defined it is equal to 4096) was
mapped using mmap syscall
and unmapped using munmap one. Every page is mapped/unmapped per a loop
iteration.
mmap/lock
The same as above, but in this case flag MAP_LOCKED was added.
open/close
The /dev/null pseudo-file was opened and closed in a loop 1024 times. It
was opened and closed once per iteration.
mount
The pseudo-filesystem procFS was mounted to a temporary directory inside
/tmp only one time.
The time between entering and exiting the system call was measured.
kill
A signal handler for SIGUSR1 was setup. Signal was sent to a child process,
which was created using fork glibc's wrapper.
Time between sending and receiving SIGUSR1 signal was measured.
Testing environment:
Processor Intel(R) Xeon(R) CPU E5-2690
2 nodes with 12 CPU cores for each one.
Best Regards,
Artem