On 2016-10-19 11:16:16 [-0700], Davidlohr Bueso wrote: > On Mon, 17 Oct 2016, Sebastian Andrzej Siewior wrote: > > > By default the application uses malloc() and all available CPUs. This > > patch introduces NUMA support which means: > > - memory is allocated node local via numa_alloc_local() > > - all CPUs of the specified NUMA node are used. This is also true if the > > number of threads set is greater than the number of CPUs available on > > this node. > > Can't we just use numactl to bind cpus and memory to be node-local?
something like numactl --cpunodebind=$NODE --membind=$NODE perf … ? This should work for memory however since we use pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpu); we would need to query the affinity mask, and deploy threads based on that mask. Using NUMA support within this bench-tool has also the side effect that the output gives all the option used. > Thanks, > Davidlohr Sebastian