On Sat, Jun 2, 2018 at 3:20 AM, Bhupesh Sharma <[email protected]> wrote:
> Hi Yanjiang,
>
> Thanks, the description of the issue is more clear now.
>
> Also I managed to fix my qualcomm board to reproduce this issue.
> Please see more comments inline:
>
> On Thu, May 31, 2018 at 11:01 AM, Jin, Yanjiang
> <[email protected]> wrote:
>> Hi Bhupesh,
>>
>> 1.  To be clearer, I listed my memory layout again here:
>>
>> In the first kernel, execute the below command to get the last virtual 
>> memory:
>>
>> #dmesg | grep memory
>> ..........
>> memory  : 0xffff800000200000 - 0xffff801800000000
>>
>> The use readelf to get the last Program Header from vmcore:
>>
>> # readelf -l vmcore
>>
>> ELF Header:
>> ........................
>>
>> Program Headers:
>>   Type           Offset             VirtAddr           PhysAddr              
>>    FileSiz            MemSiz              Flags  Align
>> ..............................................................................................................................................................
>>   LOAD        0x0000000076d40000 0xffff80017fe00000 0x0000000180000000       
>>           0x0000001680000000 0x0000001680000000  RWE    0
>>
>> Do a simple calculation:
>>
>> (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = 
>> 0xFFFF8017FFE00000 != 0xffff801800000000.
>>
>> The end virtual memory node are mismatch between vmlinux and vmcore. If you 
>> do the same 3 steps, I think you will get the same results as mine.
>>
>>
>> 2. But why you can’t reproduce my issue? The reason is my address of symbol 
>> “log_buf” is located in the last 2M.
>> I guess it isn’t in the last 2M bytes on your environment, so we get the 
>> different vmcore-dmesg results.
>> You can simply check the log_buf’s address through crash as below:
>>
>> crash> print log_buf
>> $1 = 0xffff8017ffe90000 ""
>>
>> In vmcore-dmesg.c, the function dump_dmesg_structured() wants to get log_buf 
>> offset through the below codes:
>>
>> log_buf_offset = read_file_pointer(fd, vaddr_to_offset(log_buf_vaddr));
>> log_buf_offset = vaddr_to_offset(log_buf);
>>
>> Error happens in vaddr_to_offset(), it reports the below error on my board:
>> “No program header covering vaddr 0xffff8017ffe90000 found kexec bug?”
>>
>> If I adjust my memory’s layout, don’t put log_buf into the last 2M, 
>> vmcore-dmesg will succeed. But this issue still exists, vmlinux and vmcore’s 
>> layouts are mismatch.
>>
>> log_buf in the last 2M is not common, but it does happen on my board.
>>
>>
>> 3. Now let's go back to the code itself. No matter we can reproduce this bug 
>> or not, phys_offset’s code’s issue always exists.
>>
>> In kernel:
>> arm64_memblock_init() calls round_down to recalculate memstart_addr:
>>
>> memstart_addr = round_down(memblock_start_of_DRAM(),  ARM64_MEMSTART_ALIGN);
>>
>> memblock_start_of_DRAM() is 0x200000, it is the first memblock’s base.
>> ARM64_MEMSTART_ALIGN is 0x40000000 on my board.
>>
>> So memstart_addr is 0, and phys_offset = memstart_addr = 0;
>>
>> But in kexec-tools:
>> phys_offset is set in the function get_memory_ranges_iomem_cb() :
>>
>> get_memory_ranges_iomem_cb()->set_phys_offset().
>>
>> This function is just get the first memblock’s base(first block of 
>> “/proc/iomem”), no round_down() operation.
>>
>> To align with kernel, kexec-tools should call the similar round_down() 
>> function for this base. But obviously, kexec-tools doesn’t do this step.
>> It’s hard to get kernel’s round_down parameters in kexec-tools, but read 
>> memstart_addr’s value from DEVMEM is safe, we can always get the correct 
>> value regardless of whether KASLR is enabled.
>
> The problem statement is more clearer now (thanks for detailing the
> environment in your last email).
>
> I think I understand the issue with 'memstart_addr' being 0 and it is
> part of a few KASLR (although a few of them are valid for non-KASLR
> case as well) related queries that I have recently asked arm64 kernel
> maintainers upstream (please see [1] for details).
>
> There I have asked the maintainers about their views regarding whether
> we should update the user-space tools which use the memblocks listed
> in '/proc/iomem' to obtain the value of PHY_OFFSET (by reading the
> base of the 1st memblock) and read the value of 'memstart_addr'
> somehow in user-space to get the PHY_OFFSET, or should the change be
> done at the kernel end to calculate 'memstart_addr' as:
>
>         /*
>          * Select a suitable value for the base of physical memory.
>          */
>         memstart_addr = round_down(memblock_start_of_DRAM(),
>                                    ARM64_MEMSTART_ALIGN);
>         if (memstart_addr)

Sorry for the typo: I meant if (!memstart_addr) above

Thanks,
Bhupesh

>                 memstart_addr = memblock_start_of_DRAM();
>
> Let's wait for an update from the ARM64 kernel maintainers, because I
> think this change might be needed in other user-space tools (if we
> decide to make the change in the user-space side) e.g. makedumpfile in
> addition to kexec-tools to correctly handle this unique use-case where
> we have value of memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN
>
> [1] https://www.spinics.net/lists/arm-kernel/msg655933.html
>
> Thanks,
> Bhupesh
>
>
>>
>>> -----Original Message-----
>>> From: Bhupesh Sharma [mailto:[email protected]]
>>> Sent: 2018年5月30日 23:56
>>> To: Jin, Yanjiang <[email protected]>; Pratyush Anand
>>> <[email protected]>
>>> Cc: [email protected]; [email protected]; [email protected];
>>> Zheng, Joey <[email protected]>
>>> Subject: [此邮件可能存在风险] Re: [PATCH] arm64: update PHYS_OFFSET to
>>> conform to kernel
>>>
>>> On 05/30/2018 03:50 PM, Jin, Yanjiang wrote:
>>> >
>>> >
>>> >> -----Original Message-----
>>> >> From: Bhupesh Sharma [mailto:[email protected]]
>>> >> Sent: 2018年5月30日 16:39
>>> >> To: Jin, Yanjiang <[email protected]>; Pratyush Anand
>>> >> <[email protected]>
>>> >> Cc: [email protected]; [email protected];
>>> >> [email protected]; Zheng, Joey <[email protected]>
>>> >> Subject: Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel
>>> >>
>>> >> Hi Yanjiang,
>>> >>
>>> >> On 05/30/2018 01:09 PM, Jin, Yanjiang wrote:
>>> >>>
>>> >>>
>>> >>>> -----Original Message-----
>>> >>>> From: Pratyush Anand [mailto:[email protected]]
>>> >>>> Sent: 2018年5月30日 12:16
>>> >>>> To: Jin, Yanjiang <[email protected]>
>>> >>>> Cc: [email protected]; [email protected];
>>> >>>> [email protected]
>>> >>>> Subject: Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel
>>> >>>>
>>> >>>> Hi Yanjiang,
>>> >>>>
>>> >>>> On Wed, May 30, 2018 at 8:33 AM, Jin, Yanjiang
>>> >>>> <[email protected]>
>>> >>>> wrote:
>>> >>>>> Hi Pratyush,
>>> >>>>>
>>> >>>>> Thanks for your help! but please see my reply inline.
>>> >>>>>
>>> >>>>
>>> >>>> [...]
>>> >>>>
>>> >>>>>>> If an application, for example, vmcore-dmesg, wants to access
>>> >>>>>>> the kernel symbol which is located in the last 2M address, it
>>> >>>>>>> would fail with the below error:
>>> >>>>>>>
>>> >>>>>>>     "No program header covering vaddr 0xffff8017ffe90000 found
>>> >>>>>>> kexec
>>> >> bug?"
>>> >>>>>>
>>> >>>>>> I think, fix might not be correct.
>>> >>>>>>
>>> >>>>>> Problem is in vmcore-dmesg and that should be fixed and not the 
>>> >>>>>> kexec.
>>> >>>>>> See here
>>> >>>>>> (https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-
>>> >>>>>> tools.git/tree/vmcore-dmesg/vmcore-dmesg.c?id=HEAD#n261).
>>> >>>>>
>>> >>>>> Firstly, for my patch, vmcore-dmesg is just an auxiliary
>>> >>>>> application to help to
>>> >>>> reproduce this issue. The function, which is to generate vmcore,
>>> >>>> is the root
>>> >> cause.
>>> >>>>
>>> >>>> ...and the function which generates vmcore is not the kexec rather
>>> >>>> the secondary kernel.
>>> >>>>
>>> >>>>>
>>> >>>>> On the other hand, vmcore-dmesg is under kexec-tools, it has no a
>>> >>>>> standalone
>>> >>>> git repo.  Even we want to fix vmcore-dmesg, we still need to send
>>> >>>> the patch to kexec-tools, right?
>>> >>>>
>>> >>>> Sure. I meant `kexec` application. We have three applications in 
>>> >>>> kexec-tools.
>>> >>>> `kexec`, `vmcore-dmesg` and `kdump`. [I hope kdump is useless and
>>> >>>> we are going to get rid off it very soon.]
>>> >>>>
>>> >>>>>
>>> >>>>> Yanjiang
>>> >>>>>
>>> >>>>>> How symbols are extracted from vmcore.
>>> >>>>>>
>>> >>>>>> You do have "NUMBER(PHYS_OFFSET)=" information in vmcore.
>>> >>>>>>
>>> >>>>>> You can probably see makedumpfile code, that how to extract
>>> >>>>>> information from "NUMBER".
>>> >>>>>
>>> >>>>> I have seen makedumpfile before, NUMBER(number) is just read a
>>> >>>>> number
>>> >>>> from vmcore. But as I show before, the root issue is vmcore
>>> >>>> contains a wrong number, my patch is to fix the vmcore generating
>>> >>>> issue, we can't read vmcore at this point since we don't have vmcore 
>>> >>>> yet.
>>> >>>>
>>> >>>> ..and IIUC, you were able to reach correctly till the end of
>>> >>>> secondary kernel where you tried vmcore-dmesg and then you had
>>> >>>> issue,
>>> >> right?
>>> >>>>
>>> >>>> How did you conclude that vmcore contains wrong number? It's
>>> >>>> unlikely, but if it does then we have problem somewhere in Linux
>>> >>>> kernel , not
>>> >> here.
>>> >>>
>>> >>> Hi Pratyush,
>>> >>>
>>> >>> I think I have found the root cause. In Linux kernel,
>>> >>> memblock_mark_nomap()
>>> >> will reserve some memory ranges for EFI, such as
>>> >> EFI_RUNTIME_SERVICES_DATA, EFI_BOOT_SERVICES_DATA. On my
>>> environment,
>>> >> the first 2M memory is EFI_RUNTIME_SERVICES_DATA, so it can't be seen
>>> >> in kernel. We also can't set this EFI memory as "reserved", only
>>> >> EFI_ACPI_RECLAIM_MEMORY's memory can be set as "reserved" and seen in
>>> kernel.
>>> >>> So I don't think this is a kernel issue, we should fix it in 
>>> >>> kexec-tools.
>>> >>> Attach kernel's call stack for reference.
>>> >>>
>>> >>> drivers/firmware/efi/arm-init.c
>>> >>>
>>> >>> efi_init()->reserve_regions()->memblock_mark_nomap()
>>> >>>
>>> >>> Hi Bhupesh,
>>> >>>
>>> >>> I guess your environment has no EFI support, or the first memblock
>>> >>> is not
>>> >> reserved for EFI, so you can't reproduce this issue.
>>> >>
>>> >> Perhaps you missed reading my earlier threads on the subject of
>>> >> EFI_ACPI_RECLAIM_MEMORY regions being mapped as NOMAP and how it
>>> >> causes the crashkernel to panic (please go through [1]).
>>> >>
>>> >> As of now we haven't found a acceptable-to-all solution for the issue
>>> >> and it needs to be fixed in the 'kexec-tools' with a minor fix in the 
>>> >> kernel side
>>> as well.
>>> >>
>>> >> So, coming back to my environment details, it has both EFI support as
>>> >> well as EFI ACPI RECLAIM regions.
>>> >>
>>> >> However we may be hitting a special case in your environment, so I
>>> >> think before we can discuss your patch further (as both Pratyush and
>>> >> myself have concerns with the same), would request you to share the
>>> >> following:
>>> >>
>>> >> - output of kernel dmesg with 'efi=debug' added in the bootargs
>>> >> (which will help us see how the memblocks are marked at your setup -
>>> >> I am specifically interested in the logs after the line 'Processing
>>> >> EFI memory map'),
>>> >
>>> > I made more investigation on my board.   I believe that the firmware 
>>> > design
>>> leads this differences between our environments:
>>> >
>>> > My firmware defines the first two EFI block as below:
>>> >
>>> > Region1: 0x000000000000-0x000000200000 [EfiReservedMemType]
>>> > Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData]
>>> >
>>> > But EFI API won't return the "EfiReservedMemType" memory to Linux kernel
>>> for security reasons, so kernel can't get any info about the first mem 
>>> block, kernel
>>> can only see region2 as below:
>>> >
>>> > efi: Processing EFI memory map:
>>> > efi:   0x000000200000-0x00000021ffff [Runtime Data       |RUN|  |  |  |  
>>> > |  |  |
>>> |WB|WT|WC|UC]
>>> >
>>> > # head -1 /proc/iomem
>>> > 00200000-0021ffff : reserved
>>>
>>> I have the same case on boards at my end:
>>>
>>> # head -1 /proc/iomem
>>> 00200000-0021ffff : reserved
>>>
>>> # dmesg | grep -i "Processing EFI memory map" -A 5
>>> [    0.000000] efi: Processing EFI memory map:
>>> [    0.000000] efi:   0x000000200000-0x00000021ffff [Runtime Data
>>> |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
>>> [    0.000000] efi:   0x000000400000-0x0000005fffff [ACPI Memory NVS
>>> |   |  |  |  |  |  |  |   |  |  |  |UC]
>>> [    0.000000] efi:   0x000000800000-0x00000081ffff [ACPI Memory NVS
>>> |   |  |  |  |  |  |  |   |  |  |  |UC]
>>> [    0.000000] efi:   0x000000820000-0x000001600fff [Conventional
>>> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
>>> [    0.000000] efi:   0x000001601000-0x0000027fffff [Loader Data
>>> |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
>>>
>>> So, no your environment is not a special one (as I also use ATF as the
>>> EL3 boot firmware), see more below ..
>>>
>>> > There are many EfiReservedMemType regions in ARM64's firmware if it
>>> supports TrustZone, but if a firmware doesn't put this type of memory 
>>> region at
>>> the start of physical memory, this error wouldn't happen. I don't think 
>>> firmware
>>> has error since it can reserve any memory regions, we'd better update kexec-
>>> tools.
>>> > Anyway, read memstart_addr from /dev/mem can always get  a correct value 
>>> > if
>>> DEVMEM is defined.
>>>
>>> .. At my side with the latest upstream kernel (with commit
>>> f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a reverted to allow crashkernel to
>>> boot while accessing ACPI tables) and latest upstream kexec-tools, I can 
>>> boot the
>>> crashkernel properly, collect the vmcore properly and analyze the crash 
>>> dump via
>>> tools like gdb and crash also.
>>>
>>> So, I would try to also use the vmcore-dmesg tool and see if I get any 
>>> issues with
>>> the same. Till then you can try and see if there are any other obvious 
>>> differences
>>> in your environment which might be causing this issue at your end.
>>>
>>> Thanks,
>>> Bhupesh
>>>
>>>
>>> >> - if you are using a public arm64 platform maybe you can share the
>>> >> CONFIG file,
>>> >> - output of 'cat /proc/iomem'
>>> >>
>>> >> [1] https://www.spinics.net/lists/arm-kernel/msg616632.html
>>> >>
>>> >> Thanks,
>>> >> Bhupesh
>>> >>
>>> >>>> Have you tried to extract "PHYS_OFFSET" from vmcore either in
>>> >>>> vmcore-dmesg or in makedumpfile and found it not matching to the
>>> >>>> value of
>>> >> "PHYS_OFFSET"
>>> >>>> from first kernel?
>>> >>>>
>>> >>>> In my understanding flow is like this:
>>> >>>>
>>> >>>> - First kernel will have reserved area for secondary kernel, as
>>> >>>> well as for
>>> >> elfcore.
>>> >>>> - First kernel will embed all the vmcore information notes into
>>> >>>> elfcore (see
>>> >>>> crash_save_vmcoreinfo_init() -> arch_crash_save_vmcoreinfo()).
>>> >>>> Therefore, we will have PHYS_OFFSET, kimage_voffset and VA_BITS
>>> >>>> information for first kernel in vmcore, which is in separate memory
>>> >>>> and can be read by second kernel
>>> >>>> - elfcore will also have notes about all the other physical memory
>>> >>>> of first kernel which need to be copied by second kernel.
>>> >>>> - Now when crash happens, second kernel should have all the
>>> >>>> required info for reading symbols from first kernel's physical memory, 
>>> >>>> no?
>>> >>>>
>>> >>>>>
>>> >>>>> NUMBER(number) = read_vmcoreinfo_ulong(STR_NUMBER(str_number))
>>> >>>>>
>>> >>>>> Yanjiang
>>> >>>>>
>>> >>>>>>
>>> >>>>>> Once you know the real PHYS_OFFSET (which could have been random
>>> >>>>>> if KASLR is enabled), you can fix the problem you are seeing.
>>> >>>>>
>>> >>>>> I have both validated with/without KASLR,  all of them worked well
>>> >>>>> after
>>> >>>> applying my patch.
>>> >>>>
>>> >>>> IMHO, even if that works it does not mean that its good a fix. We
>>> >>>> should try to find root cause. Moreover, you might not have
>>> >>>> /dev/mem available for all the configuration where KASLR is enabled.
>>> >>>>
>>> >>>> Regards
>>> >>>> Pratyush
>>> >>>
>>> >>>
>>> >>>
>>> >>> This email is intended only for the named addressee. It may contain
>>> >> information that is confidential/private, legally privileged, or
>>> >> copyright-protected, and you should handle it accordingly. If you are
>>> >> not the intended recipient, you do not have legal rights to retain,
>>> >> copy, or distribute this email or its contents, and should promptly
>>> >> delete the email and all electronic copies in your system; do not
>>> >> retain copies in any media. If you have received this email in error, 
>>> >> please
>>> notify the sender promptly. Thank you.
>>> >>>
>>> >>>
>>> >
>>> >
>>> >
>>> >
>>> > This email is intended only for the named addressee. It may contain
>>> information that is confidential/private, legally privileged, or 
>>> copyright-protected,
>>> and you should handle it accordingly. If you are not the intended 
>>> recipient, you
>>> do not have legal rights to retain, copy, or distribute this email or its 
>>> contents, and
>>> should promptly delete the email and all electronic copies in your system; 
>>> do not
>>> retain copies in any media. If you have received this email in error, 
>>> please notify
>>> the sender promptly. Thank you.
>>> >
>>> >
>>
>>
>>
>>
>> This email is intended only for the named addressee. It may contain 
>> information that is confidential/private, legally privileged, or 
>> copyright-protected, and you should handle it accordingly. If you are not 
>> the intended recipient, you do not have legal rights to retain, copy, or 
>> distribute this email or its contents, and should promptly delete the email 
>> and all electronic copies in your system; do not retain copies in any media. 
>> If you have received this email in error, please notify the sender promptly. 
>> Thank you.
>>
>>

_______________________________________________
kexec mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/kexec

Reply via email to