On an idle Xenial cloud image I'm seeing:

[ 1485.236760] [<ffff800000086ad0>] __switch_to+0x90/0xa8
[ 1485.236772] [<ffff800000143e80>] __tick_nohz_idle_enter+0x50/0x3f0
[ 1485.236776] [<ffff800000144478>] tick_nohz_idle_enter+0x40/0x70
[ 1485.236785] [<ffff80000010baf0>] cpu_startup_entry+0x288/0x2d8
[ 1485.236791] [<ffff80000008fca8>] secondary_start_kernel+0x120/0x130
[ 1485.236795] [<000000004008290c>] 0x4008290c

after a while I get:

[ 2462.806971] rcu_sched kthread starved for 15002 jiffies! g2579 c2578 f0x0 s3 
->state=0x1
[ 2667.835351] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 2667.836918]  0-...: (66 GPs behind) idle=cf0/0/0 softirq=5177/5177 fqs=0 
[ 2667.838801]  2-...: (0 ticks this GP) idle=73a/0/0 softirq=4570/4570 fqs=0 
[ 2667.840696]  3-...: (64 GPs behind) idle=eba/0/0 softirq=4654/4654 fqs=0 
[ 2667.842533]  (detected by 1, t=15002 jiffies, g=2638, c=2637, q=4389)

and at this point sleeping blocks, for example strace on sleep(1) on the
VM shows nanosleep({1, 0}) sleep forever, one has to SIGINT this as it
never times out.

Also the secondary_start_kernel() is indicative that the VM puts CPUs to
sleep and wakes them on a timer.

I can trigger this more often with more CPUs on the VM and also by
loading the host, for example, producing a lot of cache or memory
activity can trigger the initial hangs more frequently than having an
idle host.

So, I suspect there is a cpuhotplug and nohz combo causing issues here.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1531768

Title:
  [arm64] lockups some time after booting

Status in Auto Package Testing:
  Triaged
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  I created an 8 CPU arm64 instance on Canonical's Scalingstack (which I
  want to use for armhf autopkgtesting in LXD). I started with wily as
  that has lxd available (it's not yet available in trusty nor the PPA
  for arm64).

  However, pretty much any LXD task that I do (I haven't tried much
  else) on this machine takes unbearably long. A simple "lxc profile set
  default raw.lxc lxc.seccomp=" or "lxc list" takes several minutes.

  I see tons of

  [ 1020.971955] rcu_sched kthread starved for 6000 jiffies! g1095 c1094 f0x0
  [ 1121.166926] INFO: task fsnotify_mark:69 blocked for more than 120 seconds.

  in dmesg (the attached apport info has the complete dmesg).

  ProblemType: Bug
  DistroRelease: Ubuntu 15.10
  Package: linux-image-4.2.0-22-generic 4.2.0-22.27
  ProcVersionSignature: User Name 4.2.0-22.27-generic 4.2.6
  Uname: Linux 4.2.0-22-generic aarch64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Jan  7 09:18 seq
   crw-rw---- 1 root audio 116, 33 Jan  7 09:18 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.19.1-0ubuntu5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  Date: Thu Jan  7 09:24:01 2016
  IwConfig:
   eth0      no wireless extensions.

   lo        no wireless extensions.

   lxcbr0    no wireless extensions.
  Lspci:
   00:00.0 Host bridge [0600]: Red Hat, Inc. Device [1b36:0008]
    Subsystem: Red Hat, Inc Device [1af4:1100]
    Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
    Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize 
libusb: -99
  PciMultimedia:

  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB:

  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-22-generic 
root=LABEL=cloudimg-rootfs earlyprintk
  RelatedPackageVersions:
   linux-restricted-modules-4.2.0-22-generic N/A
   linux-backports-modules-4.2.0-22-generic  N/A
   linux-firmware                            1.149.3
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/auto-package-testing/+bug/1531768/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to