On Thu, May 07, 2020 at 05:24:17PM +0200, Jirka Hladky wrote: > Hi Mel, > > > > Yes, it's indeed OMP. With low threads count, I mean up to 2x number of > > > NUMA nodes (8 threads on 4 NUMA node servers, 16 threads on 8 NUMA node > > > servers). > > > > Ok, so we know it's within the imbalance threshold where a NUMA node can > > be left idle. > > we have discussed today with my colleagues the performance drop for > some workloads for low threads counts (roughly up to 2x number of NUMA > nodes). We are worried that it can be a severe issue for some use > cases, which require a full memory bandwidth even when only part of > CPUs is used. > > We understand that scheduler cannot distinguish this type of workload > from others automatically. However, there was an idea for a * new > kernel tunable to control the imbalance threshold *. Based on the > purpose of the server, users could set this tunable. See the tuned > project, which allows creating performance profiles [1]. >
I'm not completely opposed to it but given that the setting is global, I imagine it could have other consequences if two applications ran at different times have different requirements. Given that it's OMP, I would have imagined that an application that really cared about this would specify what was needed using OMP_PLACES. Why would someone prefer kernel tuning or a tuned profile over OMP_PLACES? After all, it requires specific knowledge of the application even to know that a particular tuned profile is needed. -- Mel Gorman SUSE Labs