Launchpad has imported 4 comments from the remote bug at
https://bugzilla.kernel.org/show_bug.cgi?id=207519.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2020-04-30T11:00:39+00:00 colin.king wrote:

originally I triggered this with stress-ng on V0.11.08 running sudo
stress-ng --perf --cpu 1 -t 10

I've pushed a commit since to not use the TLB flush event to avoid this
issue for the moment.

I've worked through all the perf event combinations and found that the
kernel panic occurs with the following events:

sudo perf record
-eexceptions:page_fault_user,exceptions:page_fault_kernel,tlb:tlb_flush
sleep 1

Bisecting the kernel I found that this issue occurred when the following
commit landed in the kernel:

commit 763802b53a427ed3cbd419dbba255c414fdd9e7c
Author: Joerg Roedel <jroe...@suse.de>
Date:   Sat Mar 21 18:22:41 2020 -0700

    x86/mm: split vmalloc_sync_all()
    
This is a 100% reproducer, always happes on x86-64 in VM and on hardware.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1875941/comments/6

------------------------------------------------------------------------
On 2020-04-30T11:20:20+00:00 colin.king wrote:

Created attachment 288837
full stack dump

Top of stack dump (attached) shows it's a stack overflow


[   22.163398] BUG: stack guard page was hit at (____ptrval____) (stack is 
(____ptrval____)..(____ptrval____))
[   22.165204] kernel stack overflow (double-fault): 0000 [#1] SMP PTI
[   22.166729] CPU: 3 PID: 935 Comm: perf Not tainted 5.4.0-28-generic 
#32-Ubuntu
[   22.168813] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.13.0-1ubuntu1 04/01/2014
[   22.171263] RIP: 0010:perf_trace_x86_exceptions+0x44/0xf0
[   22.172769] Code: 83 ec 18 48 8b 5f 78 65 48 8b 04 25 28 00 00 00 48 89 45 
d0 31 c0 65 48 03 1d 00 0c f9 68 48 8b 87 80 00 00 00 48 85 c0 75 08 <48> 8b 03 
48 85 c0 74 74 bf 24 00 00 00 48 8d 55 c4 48 8d 75 c8 e8
[   22.176573] RSP: 0018:ffff978f00838020 EFLAGS: 00010046
[   22.177569] RAX: 0000000000000000 RBX: ffffb78effdcab70 RCX: 0000000000000000
[   22.178800] RDX: ffff978f008380b8 RSI: ffffb78effdcab70 RDI: ffffffff9863e620
[   22.179993] RBP: ffff978f00838060 R08: 0000000000000000 R09: 0000000000000000
[   22.181188] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9863e620
[   22.182698] R13: 0000000000000000 R14: ffffb78effdcab70 R15: ffff978f008380b8
[   22.184019] FS:  00007ff4818af780(0000) GS:ffff892b7db80000(0000) 
knlGS:0000000000000000
[   22.185592] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   22.186732] CR2: ffff978f00837ff8 CR3: 000000007d5d8000 CR4: 00000000000006e0
[   22.188100] Call Trace:
[   22.188689]  do_page_fault+0xca/0xe0
[   22.189493]  do_async_page_fault+0x39/0x70
[   22.190388]  async_page_fault+0x34/0x40
[   22.191233] RIP: 0010:perf_trace_x86_exceptions+0x44/0xf0

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1875941/comments/7

------------------------------------------------------------------------
On 2020-04-30T11:41:52+00:00 colin.king wrote:

Finally got a full stack dump:

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1875941/comments/9

------------------------------------------------------------------------
On 2020-04-30T11:42:23+00:00 colin.king wrote:

still occurs on 5.7-rc2 and today's linux-next tip

Reply at:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1875941/comments/10


** Changed in: linux
       Status: Unknown => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1875941

Title:
  using perf can crash kernel with a stack overflow

Status in Linux:
  Confirmed
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Focal:
  New

Bug description:
  running sudo stress-ng --perf --cpu 1 -t 10 will cause the recent
  5.4.0-25-generic kernel to lock up with no information on the console
  showing where it is locked up.

  Bisected this back to:

  commit d44d71bbb9618c526820b39fe1cd0673582dc8c4 (refs/bisect/bad)
  Author: Joerg Roedel <jroe...@suse.de>
  Date:   Sat Mar 21 18:22:41 2020 -0700

      x86/mm: split vmalloc_sync_all()
      
      BugLink: https://bugs.launchpad.net/bugs/1869061
      
      commit 763802b53a427ed3cbd419dbba255c414fdd9e7c upstream.

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1875941/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to