On 2017/8/2 16:29, Laszlo Ersek wrote: > On 08/02/17 00:57, Ard Biesheuvel wrote: >> On 1 August 2017 at 23:29, Laszlo Ersek <[email protected]> wrote: >>> On 08/01/17 19:23, Ard Biesheuvel wrote: >>>> On 1 August 2017 at 16:42, Laszlo Ersek <[email protected]> wrote: >>>>> On 08/01/17 10:34, Zhu Yijun wrote: >>>>>> Thanks for your reply! >>>>>> >>>>>> On 2017/8/1 3:02, Laszlo Ersek wrote: >>>>>>> On 07/31/17 02:27, Zhu Yijun wrote: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I install a CentOS-7-aarch64 guest img by qemu cdrom, but it hung >>>>>>>> at UEFI probability. >>>>>>>> >>>>>>>> Basic info: >>>>>>>> libvirt 1.3.5 >>>>>>>> QEMU 2.6.2 >>>>>>>> UEFI: master branch with commit "688c7d2 BaseTools: Fix the bug >>>>>>>> that warn() function with only 1 argument" >>>>>>>> >>>>>>>> Config pflash and two disks in xml: >>>>>>>> >>>>>>>> ... >>>>>>>> <os> >>>>>>>> <type arch='aarch64' machine='virt-2.6'>hvm</type> >>>>>>>> <loader readonly='yes' >>>>>>>> type='pflash'>/usr/share/edk2/aarch64/QEMU_EFI-pflash.raw</loader> >>>>>>>> <boot dev='hd'/> >>>>>>>> </os> >>>>>>>> ... >>>>>>>> <disk type='file' device='disk'> >>>>>>>> <driver name='qemu' type='qcow2' cache='none' io='native'/> >>>>>>>> <source file='/CentOS-7-aarch64/centos.qcow2'/> >>>>>>>> <backingStore/> >>>>>>>> <target dev='sda' bus='scsi'/> >>>>>>>> </disk> >>>>>>>> <disk type='file' device='cdrom'> >>>>>>>> <driver name='qemu' type='raw' cache='none' io='native'/> >>>>>>>> <source >>>>>>>> file='/CentOS-7-aarch64/CentOS-7-aarch64-Everything.iso'/> >>>>>>>> <backingStore/> >>>>>>>> <target dev='sdb' bus='scsi'/> >>>>>>>> </disk> >>>>>>>> ... >>>>>>>> >>>>>>>> I found it failed at "Match (Translated, TranslatedSize, >>>>>>>> ActiveOption[Idx].BootOption->FilePath)" function in >>>>>>>> "SetBootOrderFromQemu", the UEFI debug info as follow: >>>>>>> No, that's not where the problem is. See below: >>>>>>> >>>>>>>> start-console-fail.log >>>>>>>> FSOpen: Open '\EFI\BOOT\fallback.efi' Success >>>>>>>> FSOpen: Open '\EFI\BOOT\fallback.efi' Success >>>>>>>> >>>>>>>> >>>>>>>> Synchronous Exception at 0x00000002384B1104 >>>>>>>> PC 0x0002384B1104 >>>>>>>> PC 0x0002384A916C >>>>>>>> PC 0x0002384CA2D0 >>>>>>>> PC 0x00023EEB7DF8 (0x00023EEB1000+0x00006DF8) [ 1] DxeCore.dll >>>>>>>> PC 0x00023BD1568C (0x00023BD02000+0x0001368C) [ 2] BdsDxe.dll >>>>>>>> PC 0x00023BD03F98 (0x00023BD02000+0x00001F98) [ 2] BdsDxe.dll >>>>>>>> PC 0x00023BD05640 (0x00023BD02000+0x00003640) [ 2] BdsDxe.dll >>>>>>>> PC 0x00023EEB3704 (0x00023EEB1000+0x00002704) [ 3] DxeCore.dll >>>>>>>> PC 0x00023EEB27C8 (0x00023EEB1000+0x000017C8) [ 3] DxeCore.dll >>>>>>>> PC 0x00023EEB2024 (0x00023EEB1000+0x00001024) [ 3] DxeCore.dll >>>>>>>> [ 1] >>>>>>>> /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll >>>>>>>> [ 2] >>>>>>>> /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Universal/BdsDxe/BdsDxe/DEBUG/BdsDxe.dll >>>>>>>> [ 3] >>>>>>>> /root/rpmbuild/BUILD/edk2-2.6.0/Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll >>>>>>>> >>>>>>>> X0 0x00000002384A9000 X1 0x00000002384B2990 X2 >>>>>>>> 0x000000023AAFDF98 X3 0x000000023BFF0018 >>>>>>>> X4 0x0000000000000000 X5 0x0000000000000007 X6 >>>>>>>> 0x0000000238533300 X7 0x0000000000000000 >>>>>>>> X8 0x000000023C01F548 X9 0x0000000200000000 X10 >>>>>>>> 0x00000002384A8000 X11 0x00000002384C5FFF >>>>>>>> X12 0x0000000000000000 X13 0x0000000000000008 X14 >>>>>>>> 0x259511BDAEB1F36C X15 0x1378CC1DF3F5DDBB >>>>>>>> X16 0x000000023EEB0BE0 X17 0x0000000000000000 X18 >>>>>>>> 0x0000000000000000 X19 0x0000000000000013 >>>>>>>> X20 0x0000000000000000 X21 0x0000000000000000 X22 >>>>>>>> 0x0000000000000000 X23 0x0000000000000000 >>>>>>>> X24 0x0000000000000000 X25 0x0000000000000000 X26 >>>>>>>> 0x0000000000000000 X27 0x0000000000000000 >>>>>>>> X28 0x0000000000000000 FP 0x000000023EEB0A40 LR 0x00000002384A916C >>>>>>>> >>>>>>>> V0 0xAFAFAFAFAFAFAFAF AFAFAFAFAFAFAFAF V1 0x63702F6666666666 >>>>>>>> 6666666666666666 >>>>>>>> V2 0x40697363732F3340 6567646972622D69 V3 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> V4 0x0000000000000000 0000000000000000 V5 0x4010040140100401 >>>>>>>> 4010040140100401 >>>>>>>> V6 0x0000000000000000 0000000000000000 V7 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> V8 0x0000000000000000 0000000000000000 V9 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> V10 0x0000000000000000 0000000000000000 V11 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> V12 0x0000000000000000 0000000000000000 V13 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> V14 0x0000000000000000 0000000000000000 V15 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> V16 0x0000000000000000 0000000000000000 V17 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> V18 0x0000000000000000 0000000000000000 V19 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> V20 0x0000000000000000 0000000000000000 V21 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> V22 0x0000000000000000 0000000000000000 V23 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> V24 0x0000000000000000 0000000000000000 V25 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> V26 0x0000000000000000 0000000000000000 V27 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> V28 0x0000000000000000 0000000000000000 V29 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> V30 0x0000000000000000 0000000000000000 V31 0x0000000000000000 >>>>>>>> 0000000000000000 >>>>>>>> >>>>>>>> SP 0x000000023EEB0A40 ELR 0x00000002384B1104 SPSR 0x60000205 FPSR >>>>>>>> 0x00000000 >>>>>>>> ESR 0x02000000 FAR 0x1DE7EC7EDBADC0DE >>>>>>>> >>>>>>>> ESR : EC 0x00 IL 0x1 ISS 0x00000000 >>>>>>>> >>>>>>>> Stack dump: >>>>>>>> 000023EEB0940: 0000C0E000000148 00000002384A9000 00000002384CA254 >>>>>>>> 0000000000000000 >>>>>>>> 000023EEB0960: 000000023EEB0BC0 000000023AC006C0 0000F2503EEB0BC0 >>>>>>>> 00000002384B6018 >>>>>>>> 000023EEB0980: 000000023EEB0BC0 0000000000000000 000000000000C0E0 >>>>>>>> 0000000000000148 >>>>>>>> 000023EEB09A0: 0000000000000148 0000100000020A8C 00000002384B6110 >>>>>>>> 00000002384B6108 >>>>>>>> 000023EEB09C0: 00000002384B6100 0000000000000006 00000002384B6058 >>>>>>>> 00000002384B50DF >>>>>>>> 000023EEB09E0: 00000002384A9148 0000000000000000 00000002384A9000 >>>>>>>> 00000002384A9000 >>>>>>>> 000023EEB0A00: 0000000000000000 00000002398DA518 00000002385375B2 >>>>>>>> 00000002385629A0 >>>>>>>> 000023EEB0A20: 000000023854C1C0 00000002398DA518 000000023EEB0BC0 >>>>>>>> 0000000000000000 >>>>>>>>> 000023EEB0A40: 000000023EEB0BC0 00000002384CA2D0 000000023AAFDF98 >>>>>>>>> 000000023BFF0018 >>>>>>>> 000023EEB0A60: 00000002384CA360 000000023EEC8348 00000002385375B0 >>>>>>>> 000000023AAFDF98 >>>>>>>> 000023EEB0A80: 000000023EEB0AC0 0000F25038533338 00000002384B6018 >>>>>>>> 0000000000000000 >>>>>>>> 000023EEB0AA0: 0000000000000000 0000000238B63D18 0000000000001000 >>>>>>>> 0000000000000000 >>>>>>>> 000023EEB0AC0: 000000023BFF0018 00000002398DA518 00000002398CE598 >>>>>>>> 0000000000000000 >>>>>>>> 000023EEB0AE0: 0000000000000000 0000000000000000 00000002384C6000 >>>>>>>> 00000000000C99C0 >>>>>>>> 000023EEB0B00: 0000000200000001 0000000000000000 000000023AC006C0 >>>>>>>> 11D295625B1B31A1 >>>>>>>> 000023EEB0B20: 3B7269C9A0003F8E 0000000000000000 0000000238B63F98 >>>>>>>> 000000163EEB0B68 >>>>>>>> ASSERT [ArmCpuDxe] >>>>>>>> /root/rpmbuild/BUILD/edk2-2.6.0/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c(271): >>>>>>>> ((BOOLEAN)(0==1)) >>>>>>> This is a guest that you didn't install from installer media. I think >>>>>>> you may have gotten the preinstalled disk image from some image provider >>>>>>> service. The UEFI boot variable(s) are not set up to boot the CentOS >>>>>>> installation, in your nvram / pflash file. >>>>>> Yes, the boot variable must store in domain's nvram >>>>>> file("/var/lib/libvirt/qemu/nvram/centos_VARS.fd"). After installed, it >>>>>> generates an new boot menu >>>>>> called "CentOS Linux AltArch " which device path is >>>>>> "HD(1,GPT,D562CAA6-F61B-4F93-87FB-22DDADF6CAE2,0x800,0x64000)/\EFI\centos\shim.efi". >>>>>> >>>>>> such like: >>>>>> Boot Manager Menu >>>>>> CentOS Linux AltArch -> device path: >>>>>> PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0) >>>>>> /HD(1,GPT,D562CAA6-F61B-4F93-87FB-22DDADF6CAE2,0x800,0x64000)/\EFI\centos\shim.efi >>>>>> UEFI Misc Device >>>>>> UEFI Misc Device 2 >>>>>> EFI Internal Shell >>>>>> UEFI QEMU QEMU CD-ROM -> device path: >>>>>> PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x1) >>>>>> UEFI QEMU QEMU HARDDISK -> device path: >>>>>> PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0) >>>>>> UEFI PXEv4 (MAC:5254002D2EB6) >>>>>> >>>>>> But when I shutdown &undefine this domain, and virsh create an new >>>>>> domain with the disk centos.qcow2 which installed just before, the UEFI >>>>>> boot manager >>>>>> menu is: >>>>>> Boot Manager Menu >>>>>> UEFI QEMU QEMU HARDDISK -> device path: >>>>>> PciRoot(0x0)/Pci(0x3,0x0)/Pci(0x0,0x0)/Scsi(0x0,0x0) >>>>>> UEFI Misc Device >>>>>> UEFI Misc Device 2 >>>>>> EFI Internal Shell >>>>>> UEFI PXEv4 (MAC:5254002D2EB6) >>>>> Right. In this case you have lost your original nvram contents, and you >>>>> only have the boot options that are auto-generated by the >>>>> EfiBootManagerRefreshAllBootOption() function. This function lives in >>>>> UefiBootManagerLib, and is called from OVMF's PlatformBootManagerLib >>>>> instance. >>>>> >>>>> The filtering and reordering still occurs in OVMF, but now the first >>>>> boot option that matches QEMU's fw_cfg bootorder specification is not >>>>> the "CentOS Linux AltArch" boot option that you originally had. Instead, >>>>> now QemuBootOrderLib encounters the "UEFI QEMU QEMU HARDDISK" >>>>> auto-generated boot option first as a match. >>>>> >>>>> This boot option in turn means "fallback.efi", according to the blog >>>>> post I linked earlier. >>>>> >>>>> When "fallback.efi" executes successfully, your original "CentOS Linux >>>>> AltArch" boot option is restored / recreated (at the top of the boot >>>>> option list). But, when "fallback.efi" crashes, you get a crash instead. >>>>> >>>>>> I am confused about two points: >>>>>> 1) The new domain still have chance to load the "EFI\centos\shim.efi" >>>>>> and boot kernel successful, it means that sometimes the system firmware >>>>>> launches >>>>>> the BOOTAA64.EFI, sometimes lauches shim.efi. It is probabilistic. >>>>> "EFI\centos\shim.efi" is never automatically loaded. It needs a >>>>> dedicated UEFI boot option. Thus, it can be loaded in your "new" domain >>>>> *only* if "fallback.efi" runs first, successfully. >>>>> >>>>> So what you are seeing is that "fallback.efi" sometimes works, and >>>>> sometimes crashes. That's the nature of memory corruption bugs. >>>>> >>>>>> 2) Is there a way to make the "CentOS Linux AltArch " boot menu >>>>>> persistent? >>>>> There isn't. If you lose your nvram, you lose the non-auto-generated >>>>> boot options with it. >>>>> >>>>> Remedying such situations is what "fallback.efi" exists for. >>>>> >>>>>>> In such cases, the "fallback.efi" utility is invoked (called >>>>>>> "\EFI\BOOT\BOOTAA64.EFI). Please refer to: >>>>>>> >>>>>>> https://blog.uncooperative.org/blog/2014/02/06/the-efi-system-partition/ >>>>>>> >>>>>>> Unfortunately, "fallback.efi" (from the shim package) used to have a few >>>>>>> bugs over time and sometimes it would crash. See for example: >>>>>>> >>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1196114 >>>>>>> >>>>>>> I'm unsure what version of shim / fallback.efi is in the installed >>>>>>> CentOS image, but it looks like the same (or another similar) >>>>>>> fallback.efi issue to me. >>>>>> shim version in my side is shim-0.9-2.el7.aarch64. >>>>> This confirms that you are not seeing the exact bug described in >>>>> RHBZ#1196114, because that bug was fixed in shim-0.9 (see >>>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1196114#c16>). >>>>> >>>>> It remains a fact that your original log contains a crash register dump >>>>> after fallback.efi is launched. The V0 register contains >>>>> 0xAFAFAFAFAFAFAFAF AFAFAFAFAFAFAFAF; the pattern 0xAF is used to fill >>>>> released (freed) pages in debug builds. So this seems to be an >>>>> use-after-free issue. I suggest adding debug instrumentation to >>>>> fallback.efi, and seeing where exactly it blows up. >>>>> >>>> The presence of the 0xAF pattern in register v0 by itself does not >>>> suggest anything at all: V0 is a SIMD register, which is used by the >>>> SetMem() routine to poison the memory. There is very little other code >>>> (if any) that actually uses the SIMD registers otherwise. >>> Thanks for pointing this out. >>> >>> Can you perhaps deduce more info from the stack / register dump? The >>> topmost three stack frames don't have edk2 module names associated with >>> them -- does that confirm that the synchronous exception is raised in a >>> non-edk2 module? >>> >> The stack trace is consistent with BDS calling LoadImage() to launch >> fallback.efi (which is GNU-EFI based so it does not set the NB10 >> Codeview debug entry containing the path on the build host) >> >> The FAR (faulting address) register contains the well known bogus >> value KVM puts in there by default. Also, the exception class field in >> the ESR (bits 31:26) is 0x0 as well, which translates as an unknown >> exception. >> >> Are there any kvm related messages in the host kernel log? This looks >> like the result of kvm_inject_undefined(), which prints some kind of >> diagnostic in many cases. > Thanks, Ard! > > Zhu Yijun -- can you check this?
Yes, have checked. No related kvm messages by dmesg or /val/log/message. > Thanks > Laszlo > >>> (I still think the only way forward is to instrument fallback.efi, and I >>> won't be doing that.) >>> >> Well, if you have access to the ELF file that fallback.efi was built >> from, you can correlate the stack trace address with locations in the >> code. Lacking that, it would at least be *very* helpful to know which >> opcode is being executed when the exception is taken. Get the shim source code from https://github.com/rhboot/shim.git, the master branch version have reach to 12. I'm trying this new version to reproduce the crash issue. >> _______________________________________________ >> edk2-devel mailing list >> [email protected] >> https://lists.01.org/mailman/listinfo/edk2-devel >> > > . > _______________________________________________ edk2-devel mailing list [email protected] https://lists.01.org/mailman/listinfo/edk2-devel

