Re: Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel

Bhupesh Sharma Fri, 01 Jun 2018 14:50:51 -0700

Hi Yanjiang,

Thanks, the description of the issue is more clear now.


Also I managed to fix my qualcomm board to reproduce this issue.
Please see more comments inline:

On Thu, May 31, 2018 at 11:01 AM, Jin, Yanjiang
<[email protected]> wrote:
> Hi Bhupesh,
>
> 1.  To be clearer, I listed my memory layout again here:
>
> In the first kernel, execute the below command to get the last virtual memory:
>
> #dmesg | grep memory
> ..........
> memory  : 0xffff800000200000 - 0xffff801800000000
>
> The use readelf to get the last Program Header from vmcore:
>
> # readelf -l vmcore
>
> ELF Header:
> ........................
>
> Program Headers:
>   Type           Offset             VirtAddr           PhysAddr               
>   FileSiz            MemSiz              Flags  Align
> ..............................................................................................................................................................
>   LOAD        0x0000000076d40000 0xffff80017fe00000 0x0000000180000000        
>          0x0000001680000000 0x0000001680000000  RWE    0
>
> Do a simple calculation:
>
> (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = 
> 0xFFFF8017FFE00000 != 0xffff801800000000.
>
> The end virtual memory node are mismatch between vmlinux and vmcore. If you 
> do the same 3 steps, I think you will get the same results as mine.
>
>
> 2. But why you can’t reproduce my issue? The reason is my address of symbol 
> “log_buf” is located in the last 2M.
> I guess it isn’t in the last 2M bytes on your environment, so we get the 
> different vmcore-dmesg results.
> You can simply check the log_buf’s address through crash as below:
>
> crash> print log_buf
> $1 = 0xffff8017ffe90000 ""
>
> In vmcore-dmesg.c, the function dump_dmesg_structured() wants to get log_buf 
> offset through the below codes:
>
> log_buf_offset = read_file_pointer(fd, vaddr_to_offset(log_buf_vaddr));
> log_buf_offset = vaddr_to_offset(log_buf);
>
> Error happens in vaddr_to_offset(), it reports the below error on my board:
> “No program header covering vaddr 0xffff8017ffe90000 found kexec bug?”
>
> If I adjust my memory’s layout, don’t put log_buf into the last 2M, 
> vmcore-dmesg will succeed. But this issue still exists, vmlinux and vmcore’s 
> layouts are mismatch.
>
> log_buf in the last 2M is not common, but it does happen on my board.
>
>
> 3. Now let's go back to the code itself. No matter we can reproduce this bug 
> or not, phys_offset’s code’s issue always exists.
>
> In kernel:
> arm64_memblock_init() calls round_down to recalculate memstart_addr:
>
> memstart_addr = round_down(memblock_start_of_DRAM(),  ARM64_MEMSTART_ALIGN);
>
> memblock_start_of_DRAM() is 0x200000, it is the first memblock’s base.
> ARM64_MEMSTART_ALIGN is 0x40000000 on my board.
>
> So memstart_addr is 0, and phys_offset = memstart_addr = 0;
>
> But in kexec-tools:
> phys_offset is set in the function get_memory_ranges_iomem_cb() :
>
> get_memory_ranges_iomem_cb()->set_phys_offset().
>
> This function is just get the first memblock’s base(first block of 
> “/proc/iomem”), no round_down() operation.
>
> To align with kernel, kexec-tools should call the similar round_down() 
> function for this base. But obviously, kexec-tools doesn’t do this step.
> It’s hard to get kernel’s round_down parameters in kexec-tools, but read 
> memstart_addr’s value from DEVMEM is safe, we can always get the correct 
> value regardless of whether KASLR is enabled.

The problem statement is more clearer now (thanks for detailing the
environment in your last email).

I think I understand the issue with 'memstart_addr' being 0 and it is
part of a few KASLR (although a few of them are valid for non-KASLR
case as well) related queries that I have recently asked arm64 kernel
maintainers upstream (please see [1] for details).

There I have asked the maintainers about their views regarding whether
we should update the user-space tools which use the memblocks listed
in '/proc/iomem' to obtain the value of PHY_OFFSET (by reading the
base of the 1st memblock) and read the value of 'memstart_addr'
somehow in user-space to get the PHY_OFFSET, or should the change be
done at the kernel end to calculate 'memstart_addr' as:

        /*
         * Select a suitable value for the base of physical memory.
         */
        memstart_addr = round_down(memblock_start_of_DRAM(),
                                   ARM64_MEMSTART_ALIGN);
        if (memstart_addr)
                memstart_addr = memblock_start_of_DRAM();

Let's wait for an update from the ARM64 kernel maintainers, because I
think this change might be needed in other user-space tools (if we
decide to make the change in the user-space side) e.g. makedumpfile in
addition to kexec-tools to correctly handle this unique use-case where
we have value of memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN

[1] https://www.spinics.net/lists/arm-kernel/msg655933.html

Thanks,
Bhupesh


>
>> -----Original Message-----
>> From: Bhupesh Sharma [mailto:[email protected]]
>> Sent: 2018年5月30日 23:56
>> To: Jin, Yanjiang <[email protected]>; Pratyush Anand
>> <[email protected]>
>> Cc: [email protected]; [email protected]; [email protected];
>> Zheng, Joey <[email protected]>
>> Subject: [此邮件可能存在风险] Re: [PATCH] arm64: update PHYS_OFFSET to
>> conform to kernel
>>
>> On 05/30/2018 03:50 PM, Jin, Yanjiang wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Bhupesh Sharma [mailto:[email protected]]
>> >> Sent: 2018年5月30日 16:39
>> >> To: Jin, Yanjiang <[email protected]>; Pratyush Anand
>> >> <[email protected]>
>> >> Cc: [email protected]; [email protected];
>> >> [email protected]; Zheng, Joey <[email protected]>
>> >> Subject: Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel
>> >>
>> >> Hi Yanjiang,
>> >>
>> >> On 05/30/2018 01:09 PM, Jin, Yanjiang wrote:
>> >>>
>> >>>
>> >>>> -----Original Message-----
>> >>>> From: Pratyush Anand [mailto:[email protected]]
>> >>>> Sent: 2018年5月30日 12:16
>> >>>> To: Jin, Yanjiang <[email protected]>
>> >>>> Cc: [email protected]; [email protected];
>> >>>> [email protected]
>> >>>> Subject: Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel
>> >>>>
>> >>>> Hi Yanjiang,
>> >>>>
>> >>>> On Wed, May 30, 2018 at 8:33 AM, Jin, Yanjiang
>> >>>> <[email protected]>
>> >>>> wrote:
>> >>>>> Hi Pratyush,
>> >>>>>
>> >>>>> Thanks for your help! but please see my reply inline.
>> >>>>>
>> >>>>
>> >>>> [...]
>> >>>>
>> >>>>>>> If an application, for example, vmcore-dmesg, wants to access
>> >>>>>>> the kernel symbol which is located in the last 2M address, it
>> >>>>>>> would fail with the below error:
>> >>>>>>>
>> >>>>>>>     "No program header covering vaddr 0xffff8017ffe90000 found
>> >>>>>>> kexec
>> >> bug?"
>> >>>>>>
>> >>>>>> I think, fix might not be correct.
>> >>>>>>
>> >>>>>> Problem is in vmcore-dmesg and that should be fixed and not the kexec.
>> >>>>>> See here
>> >>>>>> (https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-
>> >>>>>> tools.git/tree/vmcore-dmesg/vmcore-dmesg.c?id=HEAD#n261).
>> >>>>>
>> >>>>> Firstly, for my patch, vmcore-dmesg is just an auxiliary
>> >>>>> application to help to
>> >>>> reproduce this issue. The function, which is to generate vmcore,
>> >>>> is the root
>> >> cause.
>> >>>>
>> >>>> ...and the function which generates vmcore is not the kexec rather
>> >>>> the secondary kernel.
>> >>>>
>> >>>>>
>> >>>>> On the other hand, vmcore-dmesg is under kexec-tools, it has no a
>> >>>>> standalone
>> >>>> git repo.  Even we want to fix vmcore-dmesg, we still need to send
>> >>>> the patch to kexec-tools, right?
>> >>>>
>> >>>> Sure. I meant `kexec` application. We have three applications in 
>> >>>> kexec-tools.
>> >>>> `kexec`, `vmcore-dmesg` and `kdump`. [I hope kdump is useless and
>> >>>> we are going to get rid off it very soon.]
>> >>>>
>> >>>>>
>> >>>>> Yanjiang
>> >>>>>
>> >>>>>> How symbols are extracted from vmcore.
>> >>>>>>
>> >>>>>> You do have "NUMBER(PHYS_OFFSET)=" information in vmcore.
>> >>>>>>
>> >>>>>> You can probably see makedumpfile code, that how to extract
>> >>>>>> information from "NUMBER".
>> >>>>>
>> >>>>> I have seen makedumpfile before, NUMBER(number) is just read a
>> >>>>> number
>> >>>> from vmcore. But as I show before, the root issue is vmcore
>> >>>> contains a wrong number, my patch is to fix the vmcore generating
>> >>>> issue, we can't read vmcore at this point since we don't have vmcore 
>> >>>> yet.
>> >>>>
>> >>>> ..and IIUC, you were able to reach correctly till the end of
>> >>>> secondary kernel where you tried vmcore-dmesg and then you had
>> >>>> issue,
>> >> right?
>> >>>>
>> >>>> How did you conclude that vmcore contains wrong number? It's
>> >>>> unlikely, but if it does then we have problem somewhere in Linux
>> >>>> kernel , not
>> >> here.
>> >>>
>> >>> Hi Pratyush,
>> >>>
>> >>> I think I have found the root cause. In Linux kernel,
>> >>> memblock_mark_nomap()
>> >> will reserve some memory ranges for EFI, such as
>> >> EFI_RUNTIME_SERVICES_DATA, EFI_BOOT_SERVICES_DATA. On my
>> environment,
>> >> the first 2M memory is EFI_RUNTIME_SERVICES_DATA, so it can't be seen
>> >> in kernel. We also can't set this EFI memory as "reserved", only
>> >> EFI_ACPI_RECLAIM_MEMORY's memory can be set as "reserved" and seen in
>> kernel.
>> >>> So I don't think this is a kernel issue, we should fix it in kexec-tools.
>> >>> Attach kernel's call stack for reference.
>> >>>
>> >>> drivers/firmware/efi/arm-init.c
>> >>>
>> >>> efi_init()->reserve_regions()->memblock_mark_nomap()
>> >>>
>> >>> Hi Bhupesh,
>> >>>
>> >>> I guess your environment has no EFI support, or the first memblock
>> >>> is not
>> >> reserved for EFI, so you can't reproduce this issue.
>> >>
>> >> Perhaps you missed reading my earlier threads on the subject of
>> >> EFI_ACPI_RECLAIM_MEMORY regions being mapped as NOMAP and how it
>> >> causes the crashkernel to panic (please go through [1]).
>> >>
>> >> As of now we haven't found a acceptable-to-all solution for the issue
>> >> and it needs to be fixed in the 'kexec-tools' with a minor fix in the 
>> >> kernel side
>> as well.
>> >>
>> >> So, coming back to my environment details, it has both EFI support as
>> >> well as EFI ACPI RECLAIM regions.
>> >>
>> >> However we may be hitting a special case in your environment, so I
>> >> think before we can discuss your patch further (as both Pratyush and
>> >> myself have concerns with the same), would request you to share the
>> >> following:
>> >>
>> >> - output of kernel dmesg with 'efi=debug' added in the bootargs
>> >> (which will help us see how the memblocks are marked at your setup -
>> >> I am specifically interested in the logs after the line 'Processing
>> >> EFI memory map'),
>> >
>> > I made more investigation on my board.   I believe that the firmware design
>> leads this differences between our environments:
>> >
>> > My firmware defines the first two EFI block as below:
>> >
>> > Region1: 0x000000000000-0x000000200000 [EfiReservedMemType]
>> > Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData]
>> >
>> > But EFI API won't return the "EfiReservedMemType" memory to Linux kernel
>> for security reasons, so kernel can't get any info about the first mem 
>> block, kernel
>> can only see region2 as below:
>> >
>> > efi: Processing EFI memory map:
>> > efi:   0x000000200000-0x00000021ffff [Runtime Data       |RUN|  |  |  |  | 
>> >  |  |
>> |WB|WT|WC|UC]
>> >
>> > # head -1 /proc/iomem
>> > 00200000-0021ffff : reserved
>>
>> I have the same case on boards at my end:
>>
>> # head -1 /proc/iomem
>> 00200000-0021ffff : reserved
>>
>> # dmesg | grep -i "Processing EFI memory map" -A 5
>> [    0.000000] efi: Processing EFI memory map:
>> [    0.000000] efi:   0x000000200000-0x00000021ffff [Runtime Data
>> |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
>> [    0.000000] efi:   0x000000400000-0x0000005fffff [ACPI Memory NVS
>> |   |  |  |  |  |  |  |   |  |  |  |UC]
>> [    0.000000] efi:   0x000000800000-0x00000081ffff [ACPI Memory NVS
>> |   |  |  |  |  |  |  |   |  |  |  |UC]
>> [    0.000000] efi:   0x000000820000-0x000001600fff [Conventional
>> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
>> [    0.000000] efi:   0x000001601000-0x0000027fffff [Loader Data
>> |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
>>
>> So, no your environment is not a special one (as I also use ATF as the
>> EL3 boot firmware), see more below ..
>>
>> > There are many EfiReservedMemType regions in ARM64's firmware if it
>> supports TrustZone, but if a firmware doesn't put this type of memory region 
>> at
>> the start of physical memory, this error wouldn't happen. I don't think 
>> firmware
>> has error since it can reserve any memory regions, we'd better update kexec-
>> tools.
>> > Anyway, read memstart_addr from /dev/mem can always get  a correct value if
>> DEVMEM is defined.
>>
>> .. At my side with the latest upstream kernel (with commit
>> f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a reverted to allow crashkernel to
>> boot while accessing ACPI tables) and latest upstream kexec-tools, I can 
>> boot the
>> crashkernel properly, collect the vmcore properly and analyze the crash dump 
>> via
>> tools like gdb and crash also.
>>
>> So, I would try to also use the vmcore-dmesg tool and see if I get any 
>> issues with
>> the same. Till then you can try and see if there are any other obvious 
>> differences
>> in your environment which might be causing this issue at your end.
>>
>> Thanks,
>> Bhupesh
>>
>>
>> >> - if you are using a public arm64 platform maybe you can share the
>> >> CONFIG file,
>> >> - output of 'cat /proc/iomem'
>> >>
>> >> [1] https://www.spinics.net/lists/arm-kernel/msg616632.html
>> >>
>> >> Thanks,
>> >> Bhupesh
>> >>
>> >>>> Have you tried to extract "PHYS_OFFSET" from vmcore either in
>> >>>> vmcore-dmesg or in makedumpfile and found it not matching to the
>> >>>> value of
>> >> "PHYS_OFFSET"
>> >>>> from first kernel?
>> >>>>
>> >>>> In my understanding flow is like this:
>> >>>>
>> >>>> - First kernel will have reserved area for secondary kernel, as
>> >>>> well as for
>> >> elfcore.
>> >>>> - First kernel will embed all the vmcore information notes into
>> >>>> elfcore (see
>> >>>> crash_save_vmcoreinfo_init() -> arch_crash_save_vmcoreinfo()).
>> >>>> Therefore, we will have PHYS_OFFSET, kimage_voffset and VA_BITS
>> >>>> information for first kernel in vmcore, which is in separate memory
>> >>>> and can be read by second kernel
>> >>>> - elfcore will also have notes about all the other physical memory
>> >>>> of first kernel which need to be copied by second kernel.
>> >>>> - Now when crash happens, second kernel should have all the
>> >>>> required info for reading symbols from first kernel's physical memory, 
>> >>>> no?
>> >>>>
>> >>>>>
>> >>>>> NUMBER(number) = read_vmcoreinfo_ulong(STR_NUMBER(str_number))
>> >>>>>
>> >>>>> Yanjiang
>> >>>>>
>> >>>>>>
>> >>>>>> Once you know the real PHYS_OFFSET (which could have been random
>> >>>>>> if KASLR is enabled), you can fix the problem you are seeing.
>> >>>>>
>> >>>>> I have both validated with/without KASLR,  all of them worked well
>> >>>>> after
>> >>>> applying my patch.
>> >>>>
>> >>>> IMHO, even if that works it does not mean that its good a fix. We
>> >>>> should try to find root cause. Moreover, you might not have
>> >>>> /dev/mem available for all the configuration where KASLR is enabled.
>> >>>>
>> >>>> Regards
>> >>>> Pratyush
>> >>>
>> >>>
>> >>>
>> >>> This email is intended only for the named addressee. It may contain
>> >> information that is confidential/private, legally privileged, or
>> >> copyright-protected, and you should handle it accordingly. If you are
>> >> not the intended recipient, you do not have legal rights to retain,
>> >> copy, or distribute this email or its contents, and should promptly
>> >> delete the email and all electronic copies in your system; do not
>> >> retain copies in any media. If you have received this email in error, 
>> >> please
>> notify the sender promptly. Thank you.
>> >>>
>> >>>
>> >
>> >
>> >
>> >
>> > This email is intended only for the named addressee. It may contain
>> information that is confidential/private, legally privileged, or 
>> copyright-protected,
>> and you should handle it accordingly. If you are not the intended recipient, 
>> you
>> do not have legal rights to retain, copy, or distribute this email or its 
>> contents, and
>> should promptly delete the email and all electronic copies in your system; 
>> do not
>> retain copies in any media. If you have received this email in error, please 
>> notify
>> the sender promptly. Thank you.
>> >
>> >
>
>
>
>
> This email is intended only for the named addressee. It may contain 
> information that is confidential/private, legally privileged, or 
> copyright-protected, and you should handle it accordingly. If you are not the 
> intended recipient, you do not have legal rights to retain, copy, or 
> distribute this email or its contents, and should promptly delete the email 
> and all electronic copies in your system; do not retain copies in any media. 
> If you have received this email in error, please notify the sender promptly. 
> Thank you.
>
>

_______________________________________________
kexec mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/kexec

Re: Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel

Reply via email to