Hi Yanjiang, Thanks, the description of the issue is more clear now.
Also I managed to fix my qualcomm board to reproduce this issue. Please see more comments inline: On Thu, May 31, 2018 at 11:01 AM, Jin, Yanjiang <[email protected]> wrote: > Hi Bhupesh, > > 1. To be clearer, I listed my memory layout again here: > > In the first kernel, execute the below command to get the last virtual memory: > > #dmesg | grep memory > .......... > memory : 0xffff800000200000 - 0xffff801800000000 > > The use readelf to get the last Program Header from vmcore: > > # readelf -l vmcore > > ELF Header: > ........................ > > Program Headers: > Type Offset VirtAddr PhysAddr > FileSiz MemSiz Flags Align > .............................................................................................................................................................. > LOAD 0x0000000076d40000 0xffff80017fe00000 0x0000000180000000 > 0x0000001680000000 0x0000001680000000 RWE 0 > > Do a simple calculation: > > (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = > 0xFFFF8017FFE00000 != 0xffff801800000000. > > The end virtual memory node are mismatch between vmlinux and vmcore. If you > do the same 3 steps, I think you will get the same results as mine. > > > 2. But why you can’t reproduce my issue? The reason is my address of symbol > “log_buf” is located in the last 2M. > I guess it isn’t in the last 2M bytes on your environment, so we get the > different vmcore-dmesg results. > You can simply check the log_buf’s address through crash as below: > > crash> print log_buf > $1 = 0xffff8017ffe90000 "" > > In vmcore-dmesg.c, the function dump_dmesg_structured() wants to get log_buf > offset through the below codes: > > log_buf_offset = read_file_pointer(fd, vaddr_to_offset(log_buf_vaddr)); > log_buf_offset = vaddr_to_offset(log_buf); > > Error happens in vaddr_to_offset(), it reports the below error on my board: > “No program header covering vaddr 0xffff8017ffe90000 found kexec bug?” > > If I adjust my memory’s layout, don’t put log_buf into the last 2M, > vmcore-dmesg will succeed. But this issue still exists, vmlinux and vmcore’s > layouts are mismatch. > > log_buf in the last 2M is not common, but it does happen on my board. > > > 3. Now let's go back to the code itself. No matter we can reproduce this bug > or not, phys_offset’s code’s issue always exists. > > In kernel: > arm64_memblock_init() calls round_down to recalculate memstart_addr: > > memstart_addr = round_down(memblock_start_of_DRAM(), ARM64_MEMSTART_ALIGN); > > memblock_start_of_DRAM() is 0x200000, it is the first memblock’s base. > ARM64_MEMSTART_ALIGN is 0x40000000 on my board. > > So memstart_addr is 0, and phys_offset = memstart_addr = 0; > > But in kexec-tools: > phys_offset is set in the function get_memory_ranges_iomem_cb() : > > get_memory_ranges_iomem_cb()->set_phys_offset(). > > This function is just get the first memblock’s base(first block of > “/proc/iomem”), no round_down() operation. > > To align with kernel, kexec-tools should call the similar round_down() > function for this base. But obviously, kexec-tools doesn’t do this step. > It’s hard to get kernel’s round_down parameters in kexec-tools, but read > memstart_addr’s value from DEVMEM is safe, we can always get the correct > value regardless of whether KASLR is enabled. The problem statement is more clearer now (thanks for detailing the environment in your last email). I think I understand the issue with 'memstart_addr' being 0 and it is part of a few KASLR (although a few of them are valid for non-KASLR case as well) related queries that I have recently asked arm64 kernel maintainers upstream (please see [1] for details). There I have asked the maintainers about their views regarding whether we should update the user-space tools which use the memblocks listed in '/proc/iomem' to obtain the value of PHY_OFFSET (by reading the base of the 1st memblock) and read the value of 'memstart_addr' somehow in user-space to get the PHY_OFFSET, or should the change be done at the kernel end to calculate 'memstart_addr' as: /* * Select a suitable value for the base of physical memory. */ memstart_addr = round_down(memblock_start_of_DRAM(), ARM64_MEMSTART_ALIGN); if (memstart_addr) memstart_addr = memblock_start_of_DRAM(); Let's wait for an update from the ARM64 kernel maintainers, because I think this change might be needed in other user-space tools (if we decide to make the change in the user-space side) e.g. makedumpfile in addition to kexec-tools to correctly handle this unique use-case where we have value of memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN [1] https://www.spinics.net/lists/arm-kernel/msg655933.html Thanks, Bhupesh > >> -----Original Message----- >> From: Bhupesh Sharma [mailto:[email protected]] >> Sent: 2018年5月30日 23:56 >> To: Jin, Yanjiang <[email protected]>; Pratyush Anand >> <[email protected]> >> Cc: [email protected]; [email protected]; [email protected]; >> Zheng, Joey <[email protected]> >> Subject: [此邮件可能存在风险] Re: [PATCH] arm64: update PHYS_OFFSET to >> conform to kernel >> >> On 05/30/2018 03:50 PM, Jin, Yanjiang wrote: >> > >> > >> >> -----Original Message----- >> >> From: Bhupesh Sharma [mailto:[email protected]] >> >> Sent: 2018年5月30日 16:39 >> >> To: Jin, Yanjiang <[email protected]>; Pratyush Anand >> >> <[email protected]> >> >> Cc: [email protected]; [email protected]; >> >> [email protected]; Zheng, Joey <[email protected]> >> >> Subject: Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel >> >> >> >> Hi Yanjiang, >> >> >> >> On 05/30/2018 01:09 PM, Jin, Yanjiang wrote: >> >>> >> >>> >> >>>> -----Original Message----- >> >>>> From: Pratyush Anand [mailto:[email protected]] >> >>>> Sent: 2018年5月30日 12:16 >> >>>> To: Jin, Yanjiang <[email protected]> >> >>>> Cc: [email protected]; [email protected]; >> >>>> [email protected] >> >>>> Subject: Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel >> >>>> >> >>>> Hi Yanjiang, >> >>>> >> >>>> On Wed, May 30, 2018 at 8:33 AM, Jin, Yanjiang >> >>>> <[email protected]> >> >>>> wrote: >> >>>>> Hi Pratyush, >> >>>>> >> >>>>> Thanks for your help! but please see my reply inline. >> >>>>> >> >>>> >> >>>> [...] >> >>>> >> >>>>>>> If an application, for example, vmcore-dmesg, wants to access >> >>>>>>> the kernel symbol which is located in the last 2M address, it >> >>>>>>> would fail with the below error: >> >>>>>>> >> >>>>>>> "No program header covering vaddr 0xffff8017ffe90000 found >> >>>>>>> kexec >> >> bug?" >> >>>>>> >> >>>>>> I think, fix might not be correct. >> >>>>>> >> >>>>>> Problem is in vmcore-dmesg and that should be fixed and not the kexec. >> >>>>>> See here >> >>>>>> (https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec- >> >>>>>> tools.git/tree/vmcore-dmesg/vmcore-dmesg.c?id=HEAD#n261). >> >>>>> >> >>>>> Firstly, for my patch, vmcore-dmesg is just an auxiliary >> >>>>> application to help to >> >>>> reproduce this issue. The function, which is to generate vmcore, >> >>>> is the root >> >> cause. >> >>>> >> >>>> ...and the function which generates vmcore is not the kexec rather >> >>>> the secondary kernel. >> >>>> >> >>>>> >> >>>>> On the other hand, vmcore-dmesg is under kexec-tools, it has no a >> >>>>> standalone >> >>>> git repo. Even we want to fix vmcore-dmesg, we still need to send >> >>>> the patch to kexec-tools, right? >> >>>> >> >>>> Sure. I meant `kexec` application. We have three applications in >> >>>> kexec-tools. >> >>>> `kexec`, `vmcore-dmesg` and `kdump`. [I hope kdump is useless and >> >>>> we are going to get rid off it very soon.] >> >>>> >> >>>>> >> >>>>> Yanjiang >> >>>>> >> >>>>>> How symbols are extracted from vmcore. >> >>>>>> >> >>>>>> You do have "NUMBER(PHYS_OFFSET)=" information in vmcore. >> >>>>>> >> >>>>>> You can probably see makedumpfile code, that how to extract >> >>>>>> information from "NUMBER". >> >>>>> >> >>>>> I have seen makedumpfile before, NUMBER(number) is just read a >> >>>>> number >> >>>> from vmcore. But as I show before, the root issue is vmcore >> >>>> contains a wrong number, my patch is to fix the vmcore generating >> >>>> issue, we can't read vmcore at this point since we don't have vmcore >> >>>> yet. >> >>>> >> >>>> ..and IIUC, you were able to reach correctly till the end of >> >>>> secondary kernel where you tried vmcore-dmesg and then you had >> >>>> issue, >> >> right? >> >>>> >> >>>> How did you conclude that vmcore contains wrong number? It's >> >>>> unlikely, but if it does then we have problem somewhere in Linux >> >>>> kernel , not >> >> here. >> >>> >> >>> Hi Pratyush, >> >>> >> >>> I think I have found the root cause. In Linux kernel, >> >>> memblock_mark_nomap() >> >> will reserve some memory ranges for EFI, such as >> >> EFI_RUNTIME_SERVICES_DATA, EFI_BOOT_SERVICES_DATA. On my >> environment, >> >> the first 2M memory is EFI_RUNTIME_SERVICES_DATA, so it can't be seen >> >> in kernel. We also can't set this EFI memory as "reserved", only >> >> EFI_ACPI_RECLAIM_MEMORY's memory can be set as "reserved" and seen in >> kernel. >> >>> So I don't think this is a kernel issue, we should fix it in kexec-tools. >> >>> Attach kernel's call stack for reference. >> >>> >> >>> drivers/firmware/efi/arm-init.c >> >>> >> >>> efi_init()->reserve_regions()->memblock_mark_nomap() >> >>> >> >>> Hi Bhupesh, >> >>> >> >>> I guess your environment has no EFI support, or the first memblock >> >>> is not >> >> reserved for EFI, so you can't reproduce this issue. >> >> >> >> Perhaps you missed reading my earlier threads on the subject of >> >> EFI_ACPI_RECLAIM_MEMORY regions being mapped as NOMAP and how it >> >> causes the crashkernel to panic (please go through [1]). >> >> >> >> As of now we haven't found a acceptable-to-all solution for the issue >> >> and it needs to be fixed in the 'kexec-tools' with a minor fix in the >> >> kernel side >> as well. >> >> >> >> So, coming back to my environment details, it has both EFI support as >> >> well as EFI ACPI RECLAIM regions. >> >> >> >> However we may be hitting a special case in your environment, so I >> >> think before we can discuss your patch further (as both Pratyush and >> >> myself have concerns with the same), would request you to share the >> >> following: >> >> >> >> - output of kernel dmesg with 'efi=debug' added in the bootargs >> >> (which will help us see how the memblocks are marked at your setup - >> >> I am specifically interested in the logs after the line 'Processing >> >> EFI memory map'), >> > >> > I made more investigation on my board. I believe that the firmware design >> leads this differences between our environments: >> > >> > My firmware defines the first two EFI block as below: >> > >> > Region1: 0x000000000000-0x000000200000 [EfiReservedMemType] >> > Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData] >> > >> > But EFI API won't return the "EfiReservedMemType" memory to Linux kernel >> for security reasons, so kernel can't get any info about the first mem >> block, kernel >> can only see region2 as below: >> > >> > efi: Processing EFI memory map: >> > efi: 0x000000200000-0x00000021ffff [Runtime Data |RUN| | | | | >> > | | >> |WB|WT|WC|UC] >> > >> > # head -1 /proc/iomem >> > 00200000-0021ffff : reserved >> >> I have the same case on boards at my end: >> >> # head -1 /proc/iomem >> 00200000-0021ffff : reserved >> >> # dmesg | grep -i "Processing EFI memory map" -A 5 >> [ 0.000000] efi: Processing EFI memory map: >> [ 0.000000] efi: 0x000000200000-0x00000021ffff [Runtime Data >> |RUN| | | | | | | |WB|WT|WC|UC] >> [ 0.000000] efi: 0x000000400000-0x0000005fffff [ACPI Memory NVS >> | | | | | | | | | | | |UC] >> [ 0.000000] efi: 0x000000800000-0x00000081ffff [ACPI Memory NVS >> | | | | | | | | | | | |UC] >> [ 0.000000] efi: 0x000000820000-0x000001600fff [Conventional >> Memory| | | | | | | | |WB|WT|WC|UC] >> [ 0.000000] efi: 0x000001601000-0x0000027fffff [Loader Data >> | | | | | | | | |WB|WT|WC|UC] >> >> So, no your environment is not a special one (as I also use ATF as the >> EL3 boot firmware), see more below .. >> >> > There are many EfiReservedMemType regions in ARM64's firmware if it >> supports TrustZone, but if a firmware doesn't put this type of memory region >> at >> the start of physical memory, this error wouldn't happen. I don't think >> firmware >> has error since it can reserve any memory regions, we'd better update kexec- >> tools. >> > Anyway, read memstart_addr from /dev/mem can always get a correct value if >> DEVMEM is defined. >> >> .. At my side with the latest upstream kernel (with commit >> f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a reverted to allow crashkernel to >> boot while accessing ACPI tables) and latest upstream kexec-tools, I can >> boot the >> crashkernel properly, collect the vmcore properly and analyze the crash dump >> via >> tools like gdb and crash also. >> >> So, I would try to also use the vmcore-dmesg tool and see if I get any >> issues with >> the same. Till then you can try and see if there are any other obvious >> differences >> in your environment which might be causing this issue at your end. >> >> Thanks, >> Bhupesh >> >> >> >> - if you are using a public arm64 platform maybe you can share the >> >> CONFIG file, >> >> - output of 'cat /proc/iomem' >> >> >> >> [1] https://www.spinics.net/lists/arm-kernel/msg616632.html >> >> >> >> Thanks, >> >> Bhupesh >> >> >> >>>> Have you tried to extract "PHYS_OFFSET" from vmcore either in >> >>>> vmcore-dmesg or in makedumpfile and found it not matching to the >> >>>> value of >> >> "PHYS_OFFSET" >> >>>> from first kernel? >> >>>> >> >>>> In my understanding flow is like this: >> >>>> >> >>>> - First kernel will have reserved area for secondary kernel, as >> >>>> well as for >> >> elfcore. >> >>>> - First kernel will embed all the vmcore information notes into >> >>>> elfcore (see >> >>>> crash_save_vmcoreinfo_init() -> arch_crash_save_vmcoreinfo()). >> >>>> Therefore, we will have PHYS_OFFSET, kimage_voffset and VA_BITS >> >>>> information for first kernel in vmcore, which is in separate memory >> >>>> and can be read by second kernel >> >>>> - elfcore will also have notes about all the other physical memory >> >>>> of first kernel which need to be copied by second kernel. >> >>>> - Now when crash happens, second kernel should have all the >> >>>> required info for reading symbols from first kernel's physical memory, >> >>>> no? >> >>>> >> >>>>> >> >>>>> NUMBER(number) = read_vmcoreinfo_ulong(STR_NUMBER(str_number)) >> >>>>> >> >>>>> Yanjiang >> >>>>> >> >>>>>> >> >>>>>> Once you know the real PHYS_OFFSET (which could have been random >> >>>>>> if KASLR is enabled), you can fix the problem you are seeing. >> >>>>> >> >>>>> I have both validated with/without KASLR, all of them worked well >> >>>>> after >> >>>> applying my patch. >> >>>> >> >>>> IMHO, even if that works it does not mean that its good a fix. We >> >>>> should try to find root cause. Moreover, you might not have >> >>>> /dev/mem available for all the configuration where KASLR is enabled. >> >>>> >> >>>> Regards >> >>>> Pratyush >> >>> >> >>> >> >>> >> >>> This email is intended only for the named addressee. It may contain >> >> information that is confidential/private, legally privileged, or >> >> copyright-protected, and you should handle it accordingly. If you are >> >> not the intended recipient, you do not have legal rights to retain, >> >> copy, or distribute this email or its contents, and should promptly >> >> delete the email and all electronic copies in your system; do not >> >> retain copies in any media. If you have received this email in error, >> >> please >> notify the sender promptly. Thank you. >> >>> >> >>> >> > >> > >> > >> > >> > This email is intended only for the named addressee. It may contain >> information that is confidential/private, legally privileged, or >> copyright-protected, >> and you should handle it accordingly. If you are not the intended recipient, >> you >> do not have legal rights to retain, copy, or distribute this email or its >> contents, and >> should promptly delete the email and all electronic copies in your system; >> do not >> retain copies in any media. If you have received this email in error, please >> notify >> the sender promptly. Thank you. >> > >> > > > > > > This email is intended only for the named addressee. It may contain > information that is confidential/private, legally privileged, or > copyright-protected, and you should handle it accordingly. If you are not the > intended recipient, you do not have legal rights to retain, copy, or > distribute this email or its contents, and should promptly delete the email > and all electronic copies in your system; do not retain copies in any media. > If you have received this email in error, please notify the sender promptly. > Thank you. > > _______________________________________________ kexec mailing list [email protected] http://lists.infradead.org/mailman/listinfo/kexec
