Blair, methinks someone is doing bitcoin mining on your systems when they are idle :-)
I WAS going to say that maybe the cpupower utility needs an update to cope with that generation of CPUs. But 7proc/cpuinfo never lies (does it ?) On 16 May 2018 at 13:22, Blair Bethwaite <blair.bethwa...@gmail.com> wrote: > On 15 May 2018 at 08:45, Wido den Hollander <w...@42on.com> wrote: >> >> > We've got some Skylake Ubuntu based hypervisors that we can look at to >> > compare tomorrow... >> > >> >> Awesome! > > > Ok, so results still inconclusive I'm afraid... > > The Ubuntu machines we're looking at (Dell R740s and C6420s running with > Performance BIOS power profile, which amongst other things disables cstates > and enables turbo) are currently running either a 4.13 or a 4.15 HWE kernel > - we needed 4.13 to support PERC10 and even get them booting from local > storage, then 4.15 to get around a prlimit bug that was breaking Nova > snapshots, so here we are. Where are you getting 4.16, > http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16/ ? > > So interestingly in our case we seem to have no cpufreq driver loaded. > After installing linux-generic-tools (cause cpupower is supposed to > supersede cpufrequtils I think?): > > rr42-03:~$ uname -a > Linux rcgpudc1rr42-03 4.15.0-13-generic #14~16.04.1-Ubuntu SMP Sat Mar 17 > 03:04:59 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > > rr42-03:~$ cat /proc/cmdline > BOOT_IMAGE=/vmlinuz-4.15.0-13-generic root=/dev/mapper/vg00-root ro > intel_iommu=on iommu=pt intel_idle.max_cstate=0 processor.max_cstate=1 > > rr42-03:~$ lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 36 > On-line CPU(s) list: 0-35 > Thread(s) per core: 1 > Core(s) per socket: 18 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 85 > Model name: Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz > Stepping: 4 > CPU MHz: 3400.956 > BogoMIPS: 5401.45 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 1024K > L3 cache: 25344K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35 > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr > pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe > syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts > rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 > monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c > rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 > invpcid_single pti intel_ppin mba tpr_shadow vnmi flexpriority ept vpid > fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a > avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw > avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total > cqm_mbm_local ibpb ibrs stibp dtherm ida arat pln pts pku ospke > > rr42-03:~$ sudo cpupower frequency-info > analyzing CPU 0: > no or unknown cpufreq driver is active on this CPU > CPUs which run at the same hardware frequency: Not Available > CPUs which need to have their frequency coordinated by software: Not > Available > maximum transition latency: Cannot determine or is not supported. > Not Available > available cpufreq governors: Not Available > Unable to determine current policy > current CPU frequency: Unable to call hardware > current CPU frequency: Unable to call to kernel > boost state support: > Supported: yes > Active: yes > > > And of course there is nothing under sysfs (/sys/devices/system/cpu*). But > /proc/cpuinfo and cpupower-monitor show that we seem to be hitting turbo > freqs: > > rr42-03:~$ sudo cpupower monitor > |Nehalem || Mperf > PKG |CORE|CPU | C3 | C6 | PC3 | PC6 || C0 | Cx | Freq > 0| 0| 0| 0.00| 0.00| 0.00| 0.00|| 0.05| 99.95| 3391 > 0| 1| 4| 0.00| 0.00| 0.00| 0.00|| 0.02| 99.98| 3389 > 0| 2| 8| 0.00| 0.00| 0.00| 0.00|| 0.14| 99.86| 3067 > 0| 3| 6| 0.00| 0.00| 0.00| 0.00|| 0.01| 99.99| 3385 > 0| 4| 2| 0.00| 0.00| 0.00| 0.00|| 0.09| 99.91| 3119 > 0| 8| 12| 0.00| 0.00| 0.00| 0.00|| 0.03| 99.97| 3312 > 0| 9| 16| 0.00| 0.00| 0.00| 0.00|| 0.11| 99.89| 3157 > 0| 10| 14| 0.00| 0.00| 0.00| 0.00|| 0.01| 99.99| 3352 > 0| 11| 10| 0.00| 0.00| 0.00| 0.00|| 0.05| 99.95| 3390 > 0| 16| 20| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3387 > 0| 17| 24| 0.00| 0.00| 0.00| 0.00|| 0.22| 99.78| 3115 > 0| 18| 26| 0.00| 0.00| 0.00| 0.00|| 0.01| 99.99| 3389 > 0| 19| 22| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3366 > 0| 20| 18| 0.00| 0.00| 0.00| 0.00|| 0.01| 99.99| 3392 > 0| 24| 28| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3376 > 0| 25| 32| 0.00| 0.00| 0.00| 0.00|| 0.05| 99.95| 3390 > 0| 26| 34| 0.00| 0.00| 0.00| 0.00|| 0.03| 99.97| 3391 > 0| 27| 30| 0.00| 0.00| 0.00| 0.00|| 0.01| 99.99| 3392 > 1| 0| 1| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3394 > 1| 1| 5| 0.00| 0.00| 0.00| 0.00|| 0.01| 99.99| 3378 > 1| 2| 9| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3393 > 1| 3| 7| 0.00| 0.00| 0.00| 0.00|| 0.01| 99.99| 3384 > 1| 4| 3| 0.00| 0.00| 0.00| 0.00|| 0.02| 99.98| 3391 > 1| 8| 13| 0.00| 0.00| 0.00| 0.00|| 0.01| 99.99| 3390 > 1| 9| 17| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3391 > 1| 10| 15| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3360 > 1| 11| 11| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3393 > 1| 16| 21| 0.00| 0.00| 0.00| 0.00|| 0.01| 99.99| 3397 > 1| 17| 25| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3391 > 1| 18| 27| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3376 > 1| 19| 23| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3334 > 1| 20| 19| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3387 > 1| 24| 29| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3377 > 1| 25| 33| 0.00| 0.00| 0.00| 0.00|| 0.01| 99.99| 3387 > 1| 26| 35| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3392 > 1| 27| 31| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3392 > > > On a similar node with the 4.13 kernel we get similar reports from > cpupower-monitor, but oddly on 4.13 /proc/cpuinfo shows all cores at base > 2700.000 (on 4.15 it updates). > > We can try 4.16 tomorrow. But I wonder why we are already seeing turbo > even at idle and you aren't... only thing I can think of is that it must be > because our cstates are disabled in BIOS, indeed when looking in dmesg I > see: > > [ 1.274325] intel_idle: disabled > > So it stands to reason that intel_idle.max_cstate=0 is doing nothing for > either of us. What do you see from intel_idle on 4.16? > > -- > Cheers, > ~Blairo > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com