Right now, smmu is using dma_alloc_coherent() to get memory to save queues
and tables. Typically, on ARM64 server, there is a default CMA located at
node0, which could be far away from node2, node3 etc.
with this patch, smmu will get memory from local numa node to save command
queues and page tables. that means dma_unmap latency will be shrunk much.
Meanwhile, when iommu.passthrough is on, device drivers which call dma_
alloc_coherent() will also get local memory and avoid the travel between
numa nodes.

Acked-by: Will Deacon <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Ganapatrao Kulkarni <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Nicolas Saenz Julienne <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Barry Song <[email protected]>
---
 -v7: add Will's acked-by

 arch/arm64/mm/init.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 481d22c32a2e..f1c75957ff3c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -429,6 +429,8 @@ void __init bootmem_init(void)
        arm64_hugetlb_cma_reserve();
 #endif
 
+       dma_pernuma_cma_reserve();
+
        /*
         * sparse_init() tries to allocate memory from memblock, so must be
         * done after the fixed reservations
-- 
2.27.0


Reply via email to