> > While running a database workload, we found a scalability issue with itimers. > > Much of the problem was caused by the thread_group_cputimer spinlock. > Each time we account for group system/user time, we need to obtain a > thread_group_cputimer's spinlock to update the timers. On larger systems > (such as a 16 socket machine), this caused more than 30% of total time > spent trying to obtain this kernel lock to update these group timer stats. > FYI, another cache line problem encountered by Mel, a368ab67aa mm: move zone lock to a different cache line than order-0 free page lists
> This patch converts the timers to 64 bit atomic variables and use > atomic add to update them without a lock. With this patch, the percent > of total time spent updating thread group cputimer timers was reduced > from 30% down to less than 1%. > > Note: On 32 bit systems using the generic 64 bit atomics, this causes > sample_group_cputimer() to take locks 3 times instead of just 1 time. > However, we tested this patch on a 32 bit system ARM system using the > generic atomics and did not find the overhead to be much of an issue. > An explanation for why this isn't an issue is that 32 bit systems usually > have small numbers of CPUs, and cacheline contention from extra spinlocks > called periodically is not really apparent on smaller systems. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

