On Wed, May 05, 2021 at 03:12:43PM -0000, dann frazier wrote:
> On ARM, accelerated guests have the host interrupt controller somewhat
> "passed through". I say that while vigorously waving my hands because I
> don't recall the details well - but I do know that we needed to teach
> the entire stack (host kernel -> QEMU -> edk2) about GICv3 and that was
> new in the xenial release. AIUI, the existing hosts in ScalingStack
> would've been X-Gene w/ older GICv2m controllers. I suspect the root
> cause of these issues lie somewhere in the GICv3 support in QEMU and/or
> edk2 - see bug 1675522 for a similar report. I'd definitely suggest
> upgrading to >= bionic if possible. Or, if not, maybe consider enabling
> the cloud archive to get a newer virt stack. If neither is possible, we
> could also of course try and bisect down the fixes for the xenial stack
> and try and SRU them back - but that will certainly take some time.
> 

Thanks for the analysis and for helping to look at this, it's really 
appreciated.

I suspect getting Scalingstack dist-upgraded to a newer release will be 
a difficult task... upgrading the virt stack alone *may* be possible, we 
can check that (the cloud archive doesn't contain edk2 though). It's 
also running Launchpad buildds though. I would expect reluctance to do 
this from their side due to the risks and, annoying as it is, think we 
should plan on identifying & doing the backports. Sorry :(

I'll file a ticket with IS for their input...

-- 
Iain Lane                                  [ i...@orangesquash.org.uk ]
Debian Developer                                   [ la...@debian.org ]
Ubuntu Developer                                   [ la...@ubuntu.com ]

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1926955

Title:
  [VMs on] arm64 eMAG system often failing to reboot with "IRQ Exception
  at 0x000000009BC11F38"

Status in edk2 package in Ubuntu:
  New
Status in linux package in Ubuntu:
  Confirmed
Status in qemu package in Ubuntu:
  New

Bug description:
  We recently installed some new arm64 Ampere eMAG hardware in the
  ScalingStack region which we use to run autopkgtests.

  Guests (currently impish VMs on OpenStack) which we spawn often fail
  to reboot, repeating this over and over:

    IRQ Exception at 0x000000009BC11F38

  Steps to reproduce:

  1. Boot a current impish daily cloud image (have not confirmed other 
releases) in a VM on an Ampere eMAG (have not verified bare metal).
  2. Wait for it to come up, and then issue `sudo reboot`
  3. If the machine comes back up again, repeat #2 a few times. This is 
intermittent.
  3a. If it fails, take a look at "openstack console log <machine id>" for the 
IRQ exception which repeats indefinitely.

  (stand by, I will get something to attach shortly)
  ---
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 May  3 12:45 seq
   crw-rw---- 1 root audio 116, 33 May  3 12:45 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.11-0ubuntu65
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CasperMD5CheckResult: unknown
  DistroRelease: Ubuntu 21.10
  Ec2AMI: ami-0000fd9f
  Ec2AMIManifest: FIXME
  Ec2AvailabilityZone: nova
  Ec2InstanceType: m1.small
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lspci-vt:
   -[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
              \-01.0-[01-02]----01.0-[02]--
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:

  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: QEMU KVM Virtual Machine
  Package: linux (not installed)
  PciMultimedia:

  ProcFB:

  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.11.0-16-generic 
root=UUID=6bd452c7-16ff-4771-b2e8-36045e00cc68 ro nohz=off
  ProcVersionSignature: Ubuntu 5.11.0-16.17-generic 5.11.12
  RelatedPackageVersions:
   linux-restricted-modules-5.11.0-16-generic N/A
   linux-backports-modules-5.11.0-16-generic  N/A
   linux-firmware                             1.197
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  Tags:  impish ec2-images
  Uname: Linux 5.11.0-16-generic aarch64
  UnreportableReason: This report is about a package that is not installed.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: False
  acpidump:

  dmi.bios.date: 02/06/2015
  dmi.bios.release: 0.0
  dmi.bios.vendor: EFI Development Kit II / OVMF
  dmi.bios.version: 0.0.0
  dmi.chassis.type: 1
  dmi.chassis.vendor: QEMU
  dmi.chassis.version: 1.0
  dmi.modalias: 
dmi:bvnEFIDevelopmentKitII/OVMF:bvr0.0.0:bd02/06/2015:br0.0:svnQEMU:pnKVMVirtualMachine:pvr1.0:cvnQEMU:ct1:cvr1.0:
  dmi.product.name: KVM Virtual Machine
  dmi.product.version: 1.0
  dmi.sys.vendor: QEMU

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1926955/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to