On Wed, Sep 25, 2019 at 5:55 PM Baoquan He <[email protected]> wrote: > > On 09/20/19 at 12:05am, Kairui Song wrote: > > Currently, kernel fails to boot on some HyperV VMs when using EFI. > > And it's a potential issue on all platforms. > > > > It's caused a broken kernel relocation on EFI systems, when below three > > conditions are met: > > > > 1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR) > > by the loader. > > 2. There isn't enough room to contain the kernel, starting from the > > default load address (eg. something else occupied part the region). > > 3. In the memmap provided by EFI firmware, there is a memory region > > starts below LOAD_PHYSICAL_ADDR, and suitable for containing the > > kernel. > > Thanks for the effort, Kairui. > > Let me summarize what I got from this issue, please correct me if > anything missed: > > *** > Problem: > This bug is reported on Hyper-V platform. The kernel will reset to > firmware w/o any console printing in 1st kernel and kdump kernel > sometime. > > *** > Root cause: > With debugging, the resetting to firmware is triggered when execute > 'rep movsq' line of /boot/compressed/head_64.S. The reason is that > efi boot stub may put kernel image below 16M, then later head_64.S will > relocate kernel to 16M directly. That relocation will conflict with some > efi reserved region, then cause the resetting. > > A more detail process based on the problem occurred on that HyperV > machine: > > - kernel (INIT_SIZE: 56820K) got loaded at 0x3c881000 (not aligned, > and not equal to pref_address 0x1000000), need to relocate. > > - efi_relocate_kernel is called, try to allocate INIT_SIZE of memory > at pref_address, failed, something else occupied this region. > > - efi_relocate_kernel call efi_low_alloc as fallback, and got the address > 0x800000 (Below 0x1000000) > > - Later in arch/x86/boot/compressed/head_64.S:108, LOAD_PHYSICAL_ADDR is > force used as the new load address as the current address is lower than > that. Then kernel try relocate to 0x1000000. > > - However the memory starting from 0x1000000 is not allocated from EFI > firmware, writing to this region caused the system to reset. > > *** > Solution: > Alwasys search area above LOAD_PHYSICAL_ADDR, namely 16M to put kernel > image in /boot/compressed/eboot.c. Then efi boot stub in eboot.c will > search an suitable area in efi memmap, to make sure no any reserved > region will conflict with the target area of kernel image. Besides, > kernel won't be relocated in /boot/compressed/head_64.S since it has > been above 16M. > > #ifdef CONFIG_RELOCATABLE > leaq startup_32(%rip) /* - $startup_32 */, %rbp > movl BP_kernel_alignment(%rsi), %eax > decl %eax > addq %rax, %rbp > notq %rax > andq %rax, %rbp > cmpq $LOAD_PHYSICAL_ADDR, %rbp > jge 1f > #endif > movq $LOAD_PHYSICAL_ADDR, %rbp > 1: > > /* Target address to relocate to for decompression */ > movl BP_init_size(%rsi), %ebx > subl $_end, %ebx > addq %rbp, %rbx >
Hi Baoquan, Yes, it's all correct. Thanks for adding these details. > > *** > I have one concerns about this patch: > > Why this only happen in Hyper-V platform. Qemu/kvm, baremetal, vmware > ESI don't have this issue? What's the difference? Let me post part the efi memmap on that machine (and btw the kernel size is 55M): kernel: efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x0000000000080000) (0MB) kernel: efi: mem01: type=4, attr=0xf, range=[0x0000000000080000-0x0000000000081000) (0MB) kernel: efi: mem02: type=2, attr=0xf, range=[0x0000000000081000-0x0000000000082000) (0MB) kernel: efi: mem03: type=7, attr=0xf, range=[0x0000000000082000-0x00000000000a0000) (0MB) kernel: efi: mem04: type=4, attr=0xf, range=[0x0000000000100000-0x000000000062a000) (5MB) kernel: efi: mem05: type=7, attr=0xf, range=[0x000000000062a000-0x0000000004200000) (59MB) kernel: efi: mem06: type=4, attr=0xf, range=[0x0000000004200000-0x0000000004400000) (2MB) kernel: efi: mem07: type=7, attr=0xf, range=[0x0000000004400000-0x00000000045c6000) (1MB) kernel: efi: mem08: type=4, attr=0xf, range=[0x00000000045c6000-0x00000000045e6000) (0MB) kernel: efi: mem09: type=3, attr=0xf, range=[0x00000000045e6000-0x000000000460b000) (0MB) kernel: efi: mem10: type=4, attr=0xf, range=[0x000000000460b000-0x0000000004613000) (0MB) kernel: efi: mem11: type=3, attr=0xf, range=[0x0000000004613000-0x000000000462b000) (0MB) kernel: efi: mem12: type=7, attr=0xf, range=[0x000000000462b000-0x0000000004800000) (1MB) kernel: efi: mem13: type=2, attr=0xf, range=[0x0000000004800000-0x0000000007f7d000) (55MB) kernel: efi: mem14: type=7, attr=0xf, range=[0x0000000007f7d000-0x0000000039a39000) (794MB) kernel: efi: mem15: type=2, attr=0xf, range=[0x0000000039a39000-0x0000000040000000) (101MB) kernel: efi: mem16: type=7, attr=0xf, range=[0x0000000040000000-0x000000004263d000) (38MB) kernel: efi: mem17: type=2, attr=0xf, range=[0x000000004263d000-0x000000007fff2000) (985MB) kernel: efi: mem18: type=0, attr=0xf, range=[0x000000007fff2000-0x000000007fff3000) (0MB) kernel: efi: mem19: type=7, attr=0xf, range=[0x000000007fff3000-0x00000000f6aaf000) (1898MB) kernel: efi: mem20: type=2, attr=0xf, range=[0x00000000f6aaf000-0x00000000f6ab0000) (0MB) kernel: efi: mem21: type=1, attr=0xf, range=[0x00000000f6ab0000-0x00000000f6bcd000) (1MB) kernel: efi: mem22: type=2, attr=0xf, range=[0x00000000f6bcd000-0x00000000f6cec000) (1MB) kernel: efi: mem23: type=1, attr=0xf, range=[0x00000000f6cec000-0x00000000f6dfb000) (1MB) kernel: efi: mem24: type=6, attr=0x800000000000000f, range=[0x00000000f6dfb000-0x00000000f6e06000) (0MB) kernel: efi: mem25: type=9, attr=0xf, range=[0x00000000f6e06000-0x00000000f6e07000) (0MB) kernel: efi: mem26: type=3, attr=0xf, range=[0x00000000f6e07000-0x00000000f6eea000) (0MB) kernel: efi: mem27: type=9, attr=0xf, range=[0x00000000f6eea000-0x00000000f6ef2000) (0MB) kernel: efi: mem28: type=6, attr=0x800000000000000f, range=[0x00000000f6ef2000-0x00000000f6f1b000) (0MB) kernel: efi: mem29: type=7, attr=0xf, range=[0x00000000f6f1b000-0x00000000f73c1000) (4MB) kernel: efi: mem30: type=4, attr=0xf, range=[0x00000000f73c1000-0x00000000f7e1b000) (10MB) kernel: efi: mem31: type=3, attr=0xf, range=[0x00000000f7e1b000-0x00000000f7f9b000) (1MB) kernel: efi: mem32: type=5, attr=0x800000000000000f, range=[0x00000000f7f9b000-0x00000000f7fcb000) (0MB) kernel: efi: mem33: type=6, attr=0x800000000000000f, range=[0x00000000f7fcb000-0x00000000f7fef000) (0MB) kernel: efi: mem34: type=0, attr=0xf, range=[0x00000000f7fef000-0x00000000f7ff3000) (0MB) kernel: efi: mem35: type=9, attr=0xf, range=[0x00000000f7ff3000-0x00000000f7ffb000) (0MB) kernel: efi: mem36: type=10, attr=0xf, range=[0x00000000f7ffb000-0x00000000f7fff000) (0MB) kernel: efi: mem37: type=4, attr=0xf, range=[0x00000000f7fff000-0x00000000f8000000) (0MB) kernel: efi: mem38: type=7, attr=0xf, range=[0x0000000100000000-0x0000000108000000) (128MB) kernel: efi: mem39: type=0, attr=0x1, range=[0x00000000000c0000-0x0000000000100000) (0MB) You see, there is a region: kernel: efi: mem05: type=7, attr=0xf, range=[0x000000000062a000-0x0000000004200000) (59MB) Which fits the kernel, and it's below 0x1000000 (16M), and the loader didn't load the kernel to a prefered address (16M), so efi-stub will relocate kernel to that low region. I didn't observe any other platform's firmware will provide a region starts below 16M and large enough to contain kernel, and load kernel into a strange address at the same time. > > By the way, I personally like this way better. Because it is fixing a > potention issue. Efi boot stub code may put kernel below 16M, but the > relocation code in boot/compressed/head_64.S doesn't consider the > possible conflict, and head_64.S have no way to know the efi memmap > information. If this patch can't be accepted, woring around it in > Hyper-V may be a way. > > Thanks > Baoquan > Thanks for the review! -- Best Regards, Kairui Song

