On 04/19/2018 04:39 PM, Alexey Dobriyan wrote: >> >> Yes, that can probably help. >> >> This is the data from the problematic skylake server: >> >> model name : Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz >> 56 sosreport-carevalo.02076935-20180413085327/proc/stat >> Interrupts: 5370 >> Interrupts without "0" entries: 1011 >> >> There are still quite a large number of non-zero entries, though. >> >>> Or maintain array of registered irqs and iterate over them only. >> Right, we can allocate a bitmap of used irqs to do that. >> >>> I have another idea. >>> >>> perf record shows mutex_lock/mutex_unlock at the top. >>> Most of them are irq mutex not seqfile mutex as there are many more >>> interrupts than reads. Take it once. >>> >> How many cpus are in your test system? In that skylake server, it was >> the per-cpu summing operation of the irq counts that was consuming most >> of the time for reading /proc/stat. I think we can certainly try to >> optimize the lock taking. > It's 16x(NR_IRQS: 4352, nr_irqs: 960, preallocated irqs: 16) > Given that irq registering is rare operation, maintaining sorted array > of irq should be the best option. >
BTW, the skylake server is 2-socket 24-core 48-thread. Cheers, Longman