On Sat, Jun 2, 2018 at 3:20 AM, Bhupesh Sharma <[email protected]> wrote: > Hi Yanjiang, > > Thanks, the description of the issue is more clear now. > > Also I managed to fix my qualcomm board to reproduce this issue. > Please see more comments inline: > > On Thu, May 31, 2018 at 11:01 AM, Jin, Yanjiang > <[email protected]> wrote: >> Hi Bhupesh, >> >> 1. To be clearer, I listed my memory layout again here: >> >> In the first kernel, execute the below command to get the last virtual >> memory: >> >> #dmesg | grep memory >> .......... >> memory : 0xffff800000200000 - 0xffff801800000000 >> >> The use readelf to get the last Program Header from vmcore: >> >> # readelf -l vmcore >> >> ELF Header: >> ........................ >> >> Program Headers: >> Type Offset VirtAddr PhysAddr >> FileSiz MemSiz Flags Align >> .............................................................................................................................................................. >> LOAD 0x0000000076d40000 0xffff80017fe00000 0x0000000180000000 >> 0x0000001680000000 0x0000001680000000 RWE 0 >> >> Do a simple calculation: >> >> (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = >> 0xFFFF8017FFE00000 != 0xffff801800000000. >> >> The end virtual memory node are mismatch between vmlinux and vmcore. If you >> do the same 3 steps, I think you will get the same results as mine. >> >> >> 2. But why you can’t reproduce my issue? The reason is my address of symbol >> “log_buf” is located in the last 2M. >> I guess it isn’t in the last 2M bytes on your environment, so we get the >> different vmcore-dmesg results. >> You can simply check the log_buf’s address through crash as below: >> >> crash> print log_buf >> $1 = 0xffff8017ffe90000 "" >> >> In vmcore-dmesg.c, the function dump_dmesg_structured() wants to get log_buf >> offset through the below codes: >> >> log_buf_offset = read_file_pointer(fd, vaddr_to_offset(log_buf_vaddr)); >> log_buf_offset = vaddr_to_offset(log_buf); >> >> Error happens in vaddr_to_offset(), it reports the below error on my board: >> “No program header covering vaddr 0xffff8017ffe90000 found kexec bug?” >> >> If I adjust my memory’s layout, don’t put log_buf into the last 2M, >> vmcore-dmesg will succeed. But this issue still exists, vmlinux and vmcore’s >> layouts are mismatch. >> >> log_buf in the last 2M is not common, but it does happen on my board. >> >> >> 3. Now let's go back to the code itself. No matter we can reproduce this bug >> or not, phys_offset’s code’s issue always exists. >> >> In kernel: >> arm64_memblock_init() calls round_down to recalculate memstart_addr: >> >> memstart_addr = round_down(memblock_start_of_DRAM(), ARM64_MEMSTART_ALIGN); >> >> memblock_start_of_DRAM() is 0x200000, it is the first memblock’s base. >> ARM64_MEMSTART_ALIGN is 0x40000000 on my board. >> >> So memstart_addr is 0, and phys_offset = memstart_addr = 0; >> >> But in kexec-tools: >> phys_offset is set in the function get_memory_ranges_iomem_cb() : >> >> get_memory_ranges_iomem_cb()->set_phys_offset(). >> >> This function is just get the first memblock’s base(first block of >> “/proc/iomem”), no round_down() operation. >> >> To align with kernel, kexec-tools should call the similar round_down() >> function for this base. But obviously, kexec-tools doesn’t do this step. >> It’s hard to get kernel’s round_down parameters in kexec-tools, but read >> memstart_addr’s value from DEVMEM is safe, we can always get the correct >> value regardless of whether KASLR is enabled. > > The problem statement is more clearer now (thanks for detailing the > environment in your last email). > > I think I understand the issue with 'memstart_addr' being 0 and it is > part of a few KASLR (although a few of them are valid for non-KASLR > case as well) related queries that I have recently asked arm64 kernel > maintainers upstream (please see [1] for details). > > There I have asked the maintainers about their views regarding whether > we should update the user-space tools which use the memblocks listed > in '/proc/iomem' to obtain the value of PHY_OFFSET (by reading the > base of the 1st memblock) and read the value of 'memstart_addr' > somehow in user-space to get the PHY_OFFSET, or should the change be > done at the kernel end to calculate 'memstart_addr' as: > > /* > * Select a suitable value for the base of physical memory. > */ > memstart_addr = round_down(memblock_start_of_DRAM(), > ARM64_MEMSTART_ALIGN); > if (memstart_addr)
Sorry for the typo: I meant if (!memstart_addr) above Thanks, Bhupesh > memstart_addr = memblock_start_of_DRAM(); > > Let's wait for an update from the ARM64 kernel maintainers, because I > think this change might be needed in other user-space tools (if we > decide to make the change in the user-space side) e.g. makedumpfile in > addition to kexec-tools to correctly handle this unique use-case where > we have value of memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN > > [1] https://www.spinics.net/lists/arm-kernel/msg655933.html > > Thanks, > Bhupesh > > >> >>> -----Original Message----- >>> From: Bhupesh Sharma [mailto:[email protected]] >>> Sent: 2018年5月30日 23:56 >>> To: Jin, Yanjiang <[email protected]>; Pratyush Anand >>> <[email protected]> >>> Cc: [email protected]; [email protected]; [email protected]; >>> Zheng, Joey <[email protected]> >>> Subject: [此邮件可能存在风险] Re: [PATCH] arm64: update PHYS_OFFSET to >>> conform to kernel >>> >>> On 05/30/2018 03:50 PM, Jin, Yanjiang wrote: >>> > >>> > >>> >> -----Original Message----- >>> >> From: Bhupesh Sharma [mailto:[email protected]] >>> >> Sent: 2018年5月30日 16:39 >>> >> To: Jin, Yanjiang <[email protected]>; Pratyush Anand >>> >> <[email protected]> >>> >> Cc: [email protected]; [email protected]; >>> >> [email protected]; Zheng, Joey <[email protected]> >>> >> Subject: Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel >>> >> >>> >> Hi Yanjiang, >>> >> >>> >> On 05/30/2018 01:09 PM, Jin, Yanjiang wrote: >>> >>> >>> >>> >>> >>>> -----Original Message----- >>> >>>> From: Pratyush Anand [mailto:[email protected]] >>> >>>> Sent: 2018年5月30日 12:16 >>> >>>> To: Jin, Yanjiang <[email protected]> >>> >>>> Cc: [email protected]; [email protected]; >>> >>>> [email protected] >>> >>>> Subject: Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel >>> >>>> >>> >>>> Hi Yanjiang, >>> >>>> >>> >>>> On Wed, May 30, 2018 at 8:33 AM, Jin, Yanjiang >>> >>>> <[email protected]> >>> >>>> wrote: >>> >>>>> Hi Pratyush, >>> >>>>> >>> >>>>> Thanks for your help! but please see my reply inline. >>> >>>>> >>> >>>> >>> >>>> [...] >>> >>>> >>> >>>>>>> If an application, for example, vmcore-dmesg, wants to access >>> >>>>>>> the kernel symbol which is located in the last 2M address, it >>> >>>>>>> would fail with the below error: >>> >>>>>>> >>> >>>>>>> "No program header covering vaddr 0xffff8017ffe90000 found >>> >>>>>>> kexec >>> >> bug?" >>> >>>>>> >>> >>>>>> I think, fix might not be correct. >>> >>>>>> >>> >>>>>> Problem is in vmcore-dmesg and that should be fixed and not the >>> >>>>>> kexec. >>> >>>>>> See here >>> >>>>>> (https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec- >>> >>>>>> tools.git/tree/vmcore-dmesg/vmcore-dmesg.c?id=HEAD#n261). >>> >>>>> >>> >>>>> Firstly, for my patch, vmcore-dmesg is just an auxiliary >>> >>>>> application to help to >>> >>>> reproduce this issue. The function, which is to generate vmcore, >>> >>>> is the root >>> >> cause. >>> >>>> >>> >>>> ...and the function which generates vmcore is not the kexec rather >>> >>>> the secondary kernel. >>> >>>> >>> >>>>> >>> >>>>> On the other hand, vmcore-dmesg is under kexec-tools, it has no a >>> >>>>> standalone >>> >>>> git repo. Even we want to fix vmcore-dmesg, we still need to send >>> >>>> the patch to kexec-tools, right? >>> >>>> >>> >>>> Sure. I meant `kexec` application. We have three applications in >>> >>>> kexec-tools. >>> >>>> `kexec`, `vmcore-dmesg` and `kdump`. [I hope kdump is useless and >>> >>>> we are going to get rid off it very soon.] >>> >>>> >>> >>>>> >>> >>>>> Yanjiang >>> >>>>> >>> >>>>>> How symbols are extracted from vmcore. >>> >>>>>> >>> >>>>>> You do have "NUMBER(PHYS_OFFSET)=" information in vmcore. >>> >>>>>> >>> >>>>>> You can probably see makedumpfile code, that how to extract >>> >>>>>> information from "NUMBER". >>> >>>>> >>> >>>>> I have seen makedumpfile before, NUMBER(number) is just read a >>> >>>>> number >>> >>>> from vmcore. But as I show before, the root issue is vmcore >>> >>>> contains a wrong number, my patch is to fix the vmcore generating >>> >>>> issue, we can't read vmcore at this point since we don't have vmcore >>> >>>> yet. >>> >>>> >>> >>>> ..and IIUC, you were able to reach correctly till the end of >>> >>>> secondary kernel where you tried vmcore-dmesg and then you had >>> >>>> issue, >>> >> right? >>> >>>> >>> >>>> How did you conclude that vmcore contains wrong number? It's >>> >>>> unlikely, but if it does then we have problem somewhere in Linux >>> >>>> kernel , not >>> >> here. >>> >>> >>> >>> Hi Pratyush, >>> >>> >>> >>> I think I have found the root cause. In Linux kernel, >>> >>> memblock_mark_nomap() >>> >> will reserve some memory ranges for EFI, such as >>> >> EFI_RUNTIME_SERVICES_DATA, EFI_BOOT_SERVICES_DATA. On my >>> environment, >>> >> the first 2M memory is EFI_RUNTIME_SERVICES_DATA, so it can't be seen >>> >> in kernel. We also can't set this EFI memory as "reserved", only >>> >> EFI_ACPI_RECLAIM_MEMORY's memory can be set as "reserved" and seen in >>> kernel. >>> >>> So I don't think this is a kernel issue, we should fix it in >>> >>> kexec-tools. >>> >>> Attach kernel's call stack for reference. >>> >>> >>> >>> drivers/firmware/efi/arm-init.c >>> >>> >>> >>> efi_init()->reserve_regions()->memblock_mark_nomap() >>> >>> >>> >>> Hi Bhupesh, >>> >>> >>> >>> I guess your environment has no EFI support, or the first memblock >>> >>> is not >>> >> reserved for EFI, so you can't reproduce this issue. >>> >> >>> >> Perhaps you missed reading my earlier threads on the subject of >>> >> EFI_ACPI_RECLAIM_MEMORY regions being mapped as NOMAP and how it >>> >> causes the crashkernel to panic (please go through [1]). >>> >> >>> >> As of now we haven't found a acceptable-to-all solution for the issue >>> >> and it needs to be fixed in the 'kexec-tools' with a minor fix in the >>> >> kernel side >>> as well. >>> >> >>> >> So, coming back to my environment details, it has both EFI support as >>> >> well as EFI ACPI RECLAIM regions. >>> >> >>> >> However we may be hitting a special case in your environment, so I >>> >> think before we can discuss your patch further (as both Pratyush and >>> >> myself have concerns with the same), would request you to share the >>> >> following: >>> >> >>> >> - output of kernel dmesg with 'efi=debug' added in the bootargs >>> >> (which will help us see how the memblocks are marked at your setup - >>> >> I am specifically interested in the logs after the line 'Processing >>> >> EFI memory map'), >>> > >>> > I made more investigation on my board. I believe that the firmware >>> > design >>> leads this differences between our environments: >>> > >>> > My firmware defines the first two EFI block as below: >>> > >>> > Region1: 0x000000000000-0x000000200000 [EfiReservedMemType] >>> > Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData] >>> > >>> > But EFI API won't return the "EfiReservedMemType" memory to Linux kernel >>> for security reasons, so kernel can't get any info about the first mem >>> block, kernel >>> can only see region2 as below: >>> > >>> > efi: Processing EFI memory map: >>> > efi: 0x000000200000-0x00000021ffff [Runtime Data |RUN| | | | >>> > | | | >>> |WB|WT|WC|UC] >>> > >>> > # head -1 /proc/iomem >>> > 00200000-0021ffff : reserved >>> >>> I have the same case on boards at my end: >>> >>> # head -1 /proc/iomem >>> 00200000-0021ffff : reserved >>> >>> # dmesg | grep -i "Processing EFI memory map" -A 5 >>> [ 0.000000] efi: Processing EFI memory map: >>> [ 0.000000] efi: 0x000000200000-0x00000021ffff [Runtime Data >>> |RUN| | | | | | | |WB|WT|WC|UC] >>> [ 0.000000] efi: 0x000000400000-0x0000005fffff [ACPI Memory NVS >>> | | | | | | | | | | | |UC] >>> [ 0.000000] efi: 0x000000800000-0x00000081ffff [ACPI Memory NVS >>> | | | | | | | | | | | |UC] >>> [ 0.000000] efi: 0x000000820000-0x000001600fff [Conventional >>> Memory| | | | | | | | |WB|WT|WC|UC] >>> [ 0.000000] efi: 0x000001601000-0x0000027fffff [Loader Data >>> | | | | | | | | |WB|WT|WC|UC] >>> >>> So, no your environment is not a special one (as I also use ATF as the >>> EL3 boot firmware), see more below .. >>> >>> > There are many EfiReservedMemType regions in ARM64's firmware if it >>> supports TrustZone, but if a firmware doesn't put this type of memory >>> region at >>> the start of physical memory, this error wouldn't happen. I don't think >>> firmware >>> has error since it can reserve any memory regions, we'd better update kexec- >>> tools. >>> > Anyway, read memstart_addr from /dev/mem can always get a correct value >>> > if >>> DEVMEM is defined. >>> >>> .. At my side with the latest upstream kernel (with commit >>> f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a reverted to allow crashkernel to >>> boot while accessing ACPI tables) and latest upstream kexec-tools, I can >>> boot the >>> crashkernel properly, collect the vmcore properly and analyze the crash >>> dump via >>> tools like gdb and crash also. >>> >>> So, I would try to also use the vmcore-dmesg tool and see if I get any >>> issues with >>> the same. Till then you can try and see if there are any other obvious >>> differences >>> in your environment which might be causing this issue at your end. >>> >>> Thanks, >>> Bhupesh >>> >>> >>> >> - if you are using a public arm64 platform maybe you can share the >>> >> CONFIG file, >>> >> - output of 'cat /proc/iomem' >>> >> >>> >> [1] https://www.spinics.net/lists/arm-kernel/msg616632.html >>> >> >>> >> Thanks, >>> >> Bhupesh >>> >> >>> >>>> Have you tried to extract "PHYS_OFFSET" from vmcore either in >>> >>>> vmcore-dmesg or in makedumpfile and found it not matching to the >>> >>>> value of >>> >> "PHYS_OFFSET" >>> >>>> from first kernel? >>> >>>> >>> >>>> In my understanding flow is like this: >>> >>>> >>> >>>> - First kernel will have reserved area for secondary kernel, as >>> >>>> well as for >>> >> elfcore. >>> >>>> - First kernel will embed all the vmcore information notes into >>> >>>> elfcore (see >>> >>>> crash_save_vmcoreinfo_init() -> arch_crash_save_vmcoreinfo()). >>> >>>> Therefore, we will have PHYS_OFFSET, kimage_voffset and VA_BITS >>> >>>> information for first kernel in vmcore, which is in separate memory >>> >>>> and can be read by second kernel >>> >>>> - elfcore will also have notes about all the other physical memory >>> >>>> of first kernel which need to be copied by second kernel. >>> >>>> - Now when crash happens, second kernel should have all the >>> >>>> required info for reading symbols from first kernel's physical memory, >>> >>>> no? >>> >>>> >>> >>>>> >>> >>>>> NUMBER(number) = read_vmcoreinfo_ulong(STR_NUMBER(str_number)) >>> >>>>> >>> >>>>> Yanjiang >>> >>>>> >>> >>>>>> >>> >>>>>> Once you know the real PHYS_OFFSET (which could have been random >>> >>>>>> if KASLR is enabled), you can fix the problem you are seeing. >>> >>>>> >>> >>>>> I have both validated with/without KASLR, all of them worked well >>> >>>>> after >>> >>>> applying my patch. >>> >>>> >>> >>>> IMHO, even if that works it does not mean that its good a fix. We >>> >>>> should try to find root cause. Moreover, you might not have >>> >>>> /dev/mem available for all the configuration where KASLR is enabled. >>> >>>> >>> >>>> Regards >>> >>>> Pratyush >>> >>> >>> >>> >>> >>> >>> >>> This email is intended only for the named addressee. It may contain >>> >> information that is confidential/private, legally privileged, or >>> >> copyright-protected, and you should handle it accordingly. If you are >>> >> not the intended recipient, you do not have legal rights to retain, >>> >> copy, or distribute this email or its contents, and should promptly >>> >> delete the email and all electronic copies in your system; do not >>> >> retain copies in any media. If you have received this email in error, >>> >> please >>> notify the sender promptly. Thank you. >>> >>> >>> >>> >>> > >>> > >>> > >>> > >>> > This email is intended only for the named addressee. It may contain >>> information that is confidential/private, legally privileged, or >>> copyright-protected, >>> and you should handle it accordingly. If you are not the intended >>> recipient, you >>> do not have legal rights to retain, copy, or distribute this email or its >>> contents, and >>> should promptly delete the email and all electronic copies in your system; >>> do not >>> retain copies in any media. If you have received this email in error, >>> please notify >>> the sender promptly. Thank you. >>> > >>> > >> >> >> >> >> This email is intended only for the named addressee. It may contain >> information that is confidential/private, legally privileged, or >> copyright-protected, and you should handle it accordingly. If you are not >> the intended recipient, you do not have legal rights to retain, copy, or >> distribute this email or its contents, and should promptly delete the email >> and all electronic copies in your system; do not retain copies in any media. >> If you have received this email in error, please notify the sender promptly. >> Thank you. >> >> _______________________________________________ kexec mailing list [email protected] http://lists.infradead.org/mailman/listinfo/kexec
