Hi Jacob, Zhang, 

One of your recent commit "thermal/powerclamp: remove cpu whitelist” [1], has 
caused a regression in the kernel. 

That commit changed powerclamp_probe from requiring all of the following 
features:

X86_FEATURE_NONSTOP_TSC
X86_FEATURE_CONSTANT_TSC
X86_FEATURE_MWAIT
X86_FEATURE_ARAT           

to *any* of them.  The problem is clamp_thread still wants to use 
mwait_idle_with_hints even if the CPU doesn't support it. 

This was reported by our users when running ubuntu 16.10 (4.8.0-22-generic) 
inside a VMware VM, though as mentioned above I don’t think it is specific to 
our platform. We have seen kernel panics due to invalid opcode because of this. 
Below is the stack trace for your reference. 

[    5.736416] invalid opcode: 0000 [#1] SMP
[    5.736455] Modules linked in: vmw_vsock_vmci_transport vsock vmw_balloon 
intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryptd intel_rapl_perf 
input_leds joydev serio_raw snd_ens1371 snd_ac97_codec gameport ac97_bus 
snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device 
snd_timer snd soundcore i2c_piix4 shpchp vmw_vmci nfit floppy(+) mac_hid 
parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid 
ahci libahci e1000 mptspi mptscsih psmouse mptbase vmwgfx scsi_transport_spi 
ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm pata_acpi 
fjes
[    5.744370] CPU: 1 PID: 912 Comm: kidle_inject/1 Not tainted 
4.8.0-22-generic #24-Ubuntu
[    5.744373] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 07/02/2015
[    5.744375] task: ffff9658f7a663c0 task.stack: ffff9658fa908000
[    5.744378] RIP: 0010:[<ffffffffc05728b8>]  [<ffffffffc05728b8>] 
clamp_thread+0x2b8/0x5d0 [intel_powerclamp]
[    5.744380] RSP: 0018:ffff9658fa90be00  EFLAGS: 00010246
[    5.744383] RAX: ffff9658fa908008 RBX: 00000000fffee0a6 RCX: 0000000000000000
[    5.744386] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000246
[    5.744388] RBP: ffff9658fa90bec0 R08: ffff9658fa908000 R09: 0000000000000000
[    5.744391] R10: 000000000001cbf7 R11: 0000000000000000 R12: ffffffff8db581a0
[    5.744393] R13: ffff9658fa908000 R14: 0000000000000000 R15: ffff9658fa908000
[    5.744396] FS:  0000000000000000(0000) GS:ffff9658fc640000(0000) 
knlGS:0000000000000000
[    5.744398] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.744401] CR2: 00007ffa6cc262e8 CR3: 000000003ab3b000 CR4: 00000000001406e0
[    5.744403] Stack:
[    5.744406]  0000000000000001 ffff9658f7a66dc0 ffff9658fc659200 
00000000e878d638
[    5.744409]  0000000000000001 00000002fc659200 0000000000000001 
ffff9658fa908008
[    5.744411]  0000000000000000 ffff9658fc64fea8 00000000fffee0a6 
ffffffffc05720a0
[    5.744414] Call Trace:
[    5.744416]  [<ffffffffc05720a0>] ? pkg_state_counter+0xa0/0xa0 
[intel_powerclamp]
[    5.744419]  [<ffffffffc0572600>] ? powerclamp_set_cur_state+0x170/0x170 
[intel_powerclamp]
[    5.744421]  [<ffffffffc0572600>] ? powerclamp_set_cur_state+0x170/0x170 
[intel_powerclamp]
[    5.744424]  [<ffffffff8cca3c18>] kthread+0xd8/0xf0
[    5.744427]  [<ffffffff8d49f29f>] ret_from_fork+0x1f/0x40
[    5.744429]  [<ffffffff8cca3b40>] ? kthread_create_on_node+0x1e0/0x1e0
[    5.744432] Code: cc e9 ba 00 00 00 eb 19 0f 1f 00 0f ae f0 65 48 8b 04 25 
04 69 01 00 0f ae b8 08 c0 ff ff 0f ae f0 31 d2 48 8b 44 24 38 48 89 d1 <0f> 01 
c8 49 8b 45 08 a8 08 75 0b b9 01 00 00 00 4c 89 f0 0f 01 
[    5.744434] RIP  [<ffffffffc05728b8>] clamp_thread+0x2b8/0x5d0 
[intel_powerclamp]
[    5.744437]  RSP <ffff9658fa90be00>
[    5.744440] invalid opcode: 0000 [#2] SMP
[    5.744452] ---[ end trace cf659c4076bf2804 ]---

Looking at the instruction at the RIP <ffffffffc05728b8> shows that the kernel 
attempted to execute “monitor” instruction. 

 8b8:   0f 01 c8                monitor %rax,%rcx,%rdx
 8bb:   49 8b 45 08             mov    0x8(%r13),%rax

To fix this, I think you should restore the explicit feature check “if block” 
that was removed in the above mentioned commit. Can you please look at this ?

Thanks,
Alok


[1] b721ca0d192754deccb89fb01c77e41e6fd91ad9
https://github.com/torvalds/linux/commit/b721ca0d192754deccb89fb01c77e41e6fd91ad9,
 

Reply via email to