On 1/25/2024 7:30 AM, Garg, Shivank wrote:
> Hi Artem,
>
>> Preliminary performance evaluation results:
>> Processor Intel(R) Xeon(R) CPU E5-2690
>> 2 nodes with 12 CPU cores for each one
>>
>> fork/1 - Time measurements include only one time of invoking this system 
>> call.
>>          Measurements are made between entering and exiting the system call.
>>
>> fork/1024 - The system call is invoked in  a loop 1024 times.
>>             The time between entering a loop and exiting it was measured.
>>
>> mmap/munmap - A set of 1024 pages (if PAGE_SIZE is not defined it is equal 
>> to 4096)
>>               was mapped using mmap syscall and unmapped using munmap one.
>>               Every page is mapped/unmapped per a loop iteration.
>>
>> mmap/lock - The same as above, but in this case flag MAP_LOCKED was added.
>>
>> open/close - The /dev/null pseudo-file was opened and closed in a loop 1024 
>> times.
>>              It was opened and closed once per iteration.
>>
>> mount - The pseudo-filesystem procFS was mounted to a temporary directory 
>> inside /tmp only one time.
>>         The time between entering and exiting the system call was measured.
>>
>> kill - A signal handler for SIGUSR1 was setup. Signal was sent to a child 
>> process,
>>        which was created using fork glibc's wrapper. Time between sending 
>> and receiving
>>        SIGUSR1 signal was measured.
>>
>> Hot caches:
>>
>> fork-1          2.3%
>> fork-1024       10.8%
>> mmap/munmap     0.4%
>> mmap/lock       4.2%
>> open/close      3.2%
>> kill            4%
>> mount           8.7%
>>
>> Cold caches:
>>
>> fork-1          42.7%
>> fork-1024       17.1%
>> mmap/munmap     0.4%
>> mmap/lock       1.5%
>> open/close      0.4%
>> kill            26.1%
>> mount           4.1%
>>
> I've conducted some testing on AMD EPYC 7713 64-Core processor (dual socket, 
> 2 NUMA nodes, 64 CPUs on each node) to evaluate the performance with this 
> patchset.
> I've implemented the syscall based testcases as suggested in your previous 
> mail. I'm shielding the 2nd NUMA node using isolcpus and nohz_full, and 
> executing the tests on cpus belonging to this node.
>
> Performance Evaluation results (% gain over base kernel 6.5.0-rc5):
>
> Hot caches:
> fork-1                1.1%
> fork-1024     -3.8%
> mmap/munmap   -1.5%
> mmap/lock     -4.7%
> open/close    -6.8%
> kill          3.3%
> mount         -13.0%
>
> Cold caches:
> fork-1                1.2%
> fork-1024     -7.2%
> mmap/munmap   -1.6%
> mmap/lock     -1.0%
> open/close    4.6%
> kill          -54.2%
> mount                 -8.5%
>
> Thanks,
> Shivank
>
Hi Shivank, thank you for performance evaluation, unfortunately we don't have 
AMD EPYC right now,
I'll try to find a way to perform measurements and clarify why such difference.

We currently trying to make performance evaluation using database related 
benchmarks.
Will return with the results after clarification.

BR


Reply via email to