[Kernel-packages] [Bug 1733662] Re: System hang with Linux kernel due to mainline commit 24247aeeabe

Rod Smith Tue, 16 Jan 2018 08:36:31 -0800

Joseph,

The first run of your latest kernel completed; however, I noticed the
following in the dmesg output:


[  426.281083] 
==================================================================
[  426.286615] BUG: KASAN: use-after-free in find_first_bit+0x1f/0x80
[  426.291841] Read of size 8 at addr ffff883ff7c1e780 by task cpuhp/31/195

[  426.302209] CPU: 31 PID: 195 Comm: cpuhp/31 Not tainted 4.13.0-25-generic 
#29~lp1733662KASANenabled
[  426.302213] Hardware name: Cisco Systems Inc UCSC-C240-M4L/UCSC-C240-M4L, 
BIOS C240M4.2.0.10c.0.032320160820 03/23/2016
[  426.302215] Call Trace:
[  426.302233]  dump_stack+0xb8/0x12d
[  426.302241]  ? dma_virt_map_sg+0xd3/0xd3
[  426.302252]  ? show_regs_print_info+0x41/0x41
[  426.302263]  print_address_description+0x6f/0x280
[  426.302269]  kasan_report+0x27a/0x370
[  426.302276]  ? find_first_bit+0x1f/0x80
[  426.302288]  __asan_load8+0x54/0x90
[  426.302295]  find_first_bit+0x1f/0x80
[  426.302306]  has_busy_rmid+0x47/0x70
[  426.302314]  intel_rdt_offline_cpu+0x4b4/0x510
[  426.302321]  ? clear_closid_rmid.isra.4+0x70/0x70
[  426.302333]  ? sysfs_remove_group+0x7a/0xc0
[  426.302339]  ? clear_closid_rmid.isra.4+0x70/0x70
[  426.302351]  cpuhp_invoke_callback+0x15f/0x7e0
[  426.302360]  ? cpuhp_kick_ap_work+0x2d0/0x2d0
[  426.302372]  ? __schedule+0x4f1/0xeb0
[  426.302377]  ? cpuhp_kick_ap_work+0x2d0/0x2d0
[  426.302385]  ? firmware_map_remove+0x1b1/0x1b1
[  426.302395]  ? migrate_swap_stop+0x2f0/0x2f0
[  426.302402]  ? firmware_map_remove+0x1b1/0x1b1
[  426.302407]  ? migrate_swap_stop+0x2f0/0x2f0
[  426.302414]  ? schedule+0xd8/0x2a0
[  426.302421]  ? __schedule+0xeb0/0xeb0
[  426.302427]  ? default_wake_function+0x2f/0x40
[  426.302439]  ? __wake_up_common+0xa1/0xc0
[  426.302446]  cpuhp_down_callbacks+0x52/0xa0
[  426.302453]  cpuhp_thread_fun+0x117/0x1a0
[  426.302459]  ? cpu_up+0x20/0x20
[  426.302468]  smpboot_thread_fn+0x20e/0x2f0
[  426.302474]  ? sort_range+0x30/0x30
[  426.302482]  kthread+0x1b7/0x1e0
[  426.302488]  ? sort_range+0x30/0x30
[  426.302493]  ? kthread_create_on_node+0xc0/0xc0
[  426.302500]  ret_from_fork+0x1f/0x30

[  426.307683] Allocated by task 56:
[  426.312817]  save_stack_trace+0x1b/0x20
[  426.312824]  save_stack+0x43/0xd0
[  426.312829]  kasan_kmalloc+0xad/0xe0
[  426.312834]  __kmalloc+0x105/0x230
[  426.312840]  intel_rdt_online_cpu+0x5a8/0x830
[  426.312846]  cpuhp_invoke_callback+0x15f/0x7e0
[  426.312850]  cpuhp_thread_fun+0x8b/0x1a0
[  426.312856]  smpboot_thread_fn+0x20e/0x2f0
[  426.312861]  kthread+0x1b7/0x1e0
[  426.312866]  ret_from_fork+0x1f/0x30

[  426.317887] Freed by task 195:
[  426.322879]  save_stack_trace+0x1b/0x20
[  426.322887]  save_stack+0x43/0xd0
[  426.322891]  kasan_slab_free+0x72/0xc0
[  426.322896]  kfree+0x94/0x1a0
[  426.322902]  intel_rdt_offline_cpu+0x17d/0x510
[  426.322908]  cpuhp_invoke_callback+0x15f/0x7e0
[  426.322912]  cpuhp_down_callbacks+0x52/0xa0
[  426.322917]  cpuhp_thread_fun+0x117/0x1a0
[  426.322925]  smpboot_thread_fn+0x20e/0x2f0
[  426.322929]  kthread+0x1b7/0x1e0
[  426.322935]  ret_from_fork+0x1f/0x30

[  426.327837] The buggy address belongs to the object at ffff883ff7c1e780
                which belongs to the cache kmalloc-8 of size 8
[  426.338289] The buggy address is located 0 bytes inside of
                8-byte region [ffff883ff7c1e780, ffff883ff7c1e788)
[  426.348805] The buggy address belongs to the page:
[  426.354223] page:ffffea00ffdf0780 count:1 mapcount:0 mapping:          
(null) index:0x0
[  426.359838] flags: 0x57ffffc0000100(slab)
[  426.365373] raw: 0057ffffc0000100 0000000000000000 0000000000000000 
0000000100aa00aa
[  426.371135] raw: dead000000000100 dead000000000200 ffff8817f500fb80 
0000000000000000
[  426.377004] page dumped because: kasan: bad access detected

[  426.388626] Memory state around the buggy address:
[  426.394498]  ffff883ff7c1e680: fc fc 00 fc fc fb fc fc 00 fc fc fb fc fc 00 
fc
[  426.400634]  ffff883ff7c1e700: fc 00 fc fc fb fc fc 00 fc fc fb fc fc fb fc 
fc
[  426.406721] >ffff883ff7c1e780: fb fc fc fb fc fc fb fc fc 00 fc fc fb fc fc 
fb
[  426.412737]                    ^
[  426.418698]  ffff883ff7c1e800: fc fc fb fc fc fb fc fc fb fc fc fb fc fc fb 
fc
[  426.424961]  ffff883ff7c1e880: fc 00 fc fc fb fc fc fb fc fc fb fc fc fb fc 
fc
[  426.431154] 
==================================================================
[  426.437413] Disabling lock debugging due to kernel taint
[  426.472795] IRQ 8: no longer affine to CPU31
[  426.472806] IRQ 9: no longer affine to CPU31
[  426.472827] IRQ 40: no longer affine to CPU31
[  426.473962] smpboot: CPU 31 is now offline

I ran it several more times without any obvious errors; however, I might
have missed something. (The dmesg output is quite verbose and scrolls by
quickly!)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel due to mainline commit 24247aeeabe

Status in linux package in Ubuntu:
  In Progress
Status in linux-hwe package in Ubuntu:
  Confirmed
Status in linux source package in Artful:
  In Progress
Status in linux-hwe source package in Artful:
  Confirmed
Status in linux source package in Bionic:
  In Progress
Status in linux-hwe source package in Bionic:
  Confirmed

Bug description:
  In doing Ubuntu 17.10 regression testing, we've encountered one
  computer (boldore, a Cisco UCS C240 M4 [VIC]), that hangs about one in
  four times when running our cpu_offlining test. This test attempts to
  take all the CPU cores offline except one, then brings them back
  online again. This test ran successfully on boldore with previous
  releases, but with 17.10, the system sometimes (about one in four
  runs) hangs. Reverting to Ubuntu 16.04.3, I found no problems; but
  when I upgraded the 16.04.3 installation to linux-
  image-4.13.0-16-generic, the problem appeared again, so I'm confident
  this is a problem with the kernel. I'm attaching two files, dmesg-
  output-4.10.txt and dmesg-output-4.13.txt, which show the dmesg output
  that appears when running the cpu_offlining test with 4.10.0-38 and
  4.13.0-16 kernels, respectively; the system hung on the 4.13 run. (I
  was running "dmesg -w" in a second SSH login; the files are cut-and-
  pasted from that.)

  I initiated this bug report from an Ubuntu 16.04.3 installation
  running a 4.10 kernel; but as I said, this applies to the 4.13 kernel.

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.10.0-38-generic 4.10.0-38.42~16.04.1
  ProcVersionSignature: User Name 4.10.0-38.42~16.04.1-generic 4.10.17
  Uname: Linux 4.10.0-38-generic x86_64
  ApportVersion: 2.20.1-0ubuntu2.10
  Architecture: amd64
  Date: Tue Nov 21 17:36:06 2017
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  SourcePackage: linux-hwe
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1733662] Re: System hang with Linux kernel due to mainline commit 24247aeeabe

Reply via email to