Hi, As numpy often allocates large arrays and one factor in its performance is faulting memory from the kernel to the process. This has some cost that is relatively significant. For example in this operation on large arrays it accounts for 10-15% of the runtime:
import numpy as np a = np.ones(10000000) b = np.ones(10000000) %timeit (a * b)**2 + 3 54.45% ipython umath.so [.] sse2_binary_multiply_DOUBLE 20.43% ipython umath.so [.] DOUBLE_add 16.66% ipython [kernel.kallsyms] [k] clear_page The reason for this is that the glibc memory allocator uses memory mapping for large allocations instead of reusing already faulted memory. The reason for this is to return memory back to the system immediately when it is free to keep the whole system more robust. This makes a lot of sense in general but not so much for many numerical applications that often are the only thing running. But despite if have been shown in an old paper that caching memory in numpy speeds up many applications, numpys usage is diverse so we couldn't really diverge from the glibc behaviour. Until Linux 4.5 added support for madvise(MADV_FREE). This flag of the madvise syscall tells the kernel that a piece of memory can be reused by other processes if there is memory pressure. Should another process claim the memory and the original process want to use it again the kernel will fault new memory into its place so it behaves exactly as if it was just freed regularly. But when no other process claims the memory and the original process wants to reuse it, the memory do not need to be faulted again. So effectively this flag allows us to cache memory inside numpy that can be reused by the rest of the system if required. Doing gives the expected speedup in the above example. An issue is that the memory usage of numpy applications will seem to increase. The memory that is actually free will still show up in the usual places you look at memory usage. Namely the resident memory usage of the process in top, /proc etc. The usage will only go down when the memory is actually needed by other processes. This probably would break some of the memory profiling tools so probably we need a switch to disable the caching for the profiling tools to use. Another concern is that using this functionality is actually the job of the system memory allocator but I had a look at glibcs allocator and it does not look like an easy job to make good use of MADV_FREE retroactively, so I don't expect this to happen anytime soon. Should it be agreed that caching is worthwhile I would propose a very simple implementation. We only really need to cache a small handful of array data pointers for the fast allocate deallocate cycle that appear in common numpy usage. For example a small list of maybe 4 pointers storing the 4 largest recent deallocations. New allocations just pick the first memory block of sufficient size. The cache would only be active on systems that support MADV_FREE (which is linux 4.5 and probably BSD too). So what do you think of this idea? cheers, Julian
signature.asc
Description: OpenPGP digital signature
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion