In !HIGHMEM cases, specially in 64-bit architectures, we don't need temp mapping of pages. Hence, k(map|unmap)_atomic() acts as nothing more than multiple barrier() calls, for example for a 2MB hugepage in clear_huge_page() these are called 512 times i.e. to map and unmap each subpage that means in total 2048 barrier calls. This called for optimization. Simply getting VADDR from page in the form of kmap_local_* APIs does the job for us. We profiled clear_huge_page() using ftrace and observed an improvement of 62%.
Setup:- Below data has been collected on Qualcomm's SM7250 SoC THP enabled (kernel v4.19.113) with only CPU-0(Cortex-A55) and CPU-7(Cortex-A76) switched on and set to max frequency, also DDR set to perf governor. FTRACE Data:- Base data:- Number of iterations: 48 Mean of allocation time: 349.5 us std deviation: 74.5 us v1 data:- Number of iterations: 48 Mean of allocation time: 131 us std deviation: 32.7 us The following simple userspace experiment to allocate 100MB(BUF_SZ) of pages and writing to it gave us a good insight, we observed an improvement of 42% in allocation and writing timings. ------------------------------------------------------------- Test code snippet ------------------------------------------------------------- clock_start(); buf = malloc(BUF_SZ); /* Allocate 100 MB of memory */ for(i=0; i < BUF_SZ_PAGES; i++) { *((int *)(buf + (i*PAGE_SIZE))) = 1; } clock_end(); ------------------------------------------------------------- Malloc test timings for 100MB anon allocation:- Base data:- Number of iterations: 100 Mean of allocation time: 31831 us std deviation: 4286 us v1 data:- Number of iterations: 100 Mean of allocation time: 18193 us std deviation: 4915 us Reported-by: Chintan Pandya <chintan.pan...@oneplus.com> Signed-off-by: Prathu Baronia <prathu.baro...@oneplus.com> --- include/linux/highmem.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/highmem.h b/include/linux/highmem.h index d2c70d3772a3..444df139b489 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -146,9 +146,9 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size) #ifndef clear_user_highpage static inline void clear_user_highpage(struct page *page, unsigned long vaddr) { - void *addr = kmap_atomic(page); + void *addr = kmap_local_page(page); clear_user_page(addr, vaddr, page); - kunmap_atomic(addr); + kunmap_local(addr); } #endif -- 2.17.1