Hi Yanjiang,

On Wed, May 30, 2018 at 8:33 AM, Jin, Yanjiang
<[email protected]> wrote:
> Hi Pratyush,
>
> Thanks for your help! but please see my reply inline.
>

[...]

>> > If an application, for example, vmcore-dmesg, wants to access the
>> > kernel symbol which is located in the last 2M address, it would fail
>> > with the below error:
>> >
>> >   "No program header covering vaddr 0xffff8017ffe90000 found kexec bug?"
>>
>> I think, fix might not be correct.
>>
>> Problem is in vmcore-dmesg and that should be fixed and not the kexec.
>> See here (https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-
>> tools.git/tree/vmcore-dmesg/vmcore-dmesg.c?id=HEAD#n261).
>
> Firstly, for my patch, vmcore-dmesg is just an auxiliary application to help 
> to reproduce this issue. The function, which is to generate vmcore,  is the 
> root cause.

...and the function which generates vmcore is not the kexec rather the
secondary kernel.

>
> On the other hand, vmcore-dmesg is under kexec-tools, it has no a standalone 
> git repo.  Even we want to fix vmcore-dmesg, we still need to send the patch 
> to kexec-tools, right?

Sure. I meant `kexec` application. We have three applications in
kexec-tools. `kexec`, `vmcore-dmesg` and `kdump`. [I hope kdump is
useless and we are going to get rid off it very soon.]

>
> Yanjiang
>
>> How symbols are extracted from vmcore.
>>
>> You do have "NUMBER(PHYS_OFFSET)=" information in vmcore.
>>
>> You can probably see makedumpfile code, that how to extract information from
>> "NUMBER".
>
> I have seen makedumpfile before, NUMBER(number) is just read a number from 
> vmcore. But as I show before, the root issue is vmcore contains a wrong 
> number, my patch is to fix the vmcore generating issue, we can't read vmcore 
> at this point since we don't have vmcore yet.

..and IIUC, you were able to reach correctly till the end of secondary
kernel where you tried vmcore-dmesg and then you had issue, right?

How did you conclude that vmcore contains wrong number? It's unlikely,
but if it does then we have problem somewhere in Linux kernel , not
here.

Have you tried to extract "PHYS_OFFSET" from vmcore either in
vmcore-dmesg or in makedumpfile and found it not matching to the value
of "PHYS_OFFSET" from first kernel?

In my understanding flow is like this:

- First kernel will have reserved area for secondary kernel, as well
as for elfcore.
- First kernel will embed all the vmcore information notes into
elfcore (see crash_save_vmcoreinfo_init() ->
arch_crash_save_vmcoreinfo()). Therefore, we will have PHYS_OFFSET,
kimage_voffset and VA_BITS information for first kernel in vmcore,
which is in separate memory and can be read by second kernel
- elfcore will also have notes about all the other physical memory of
first kernel which need to be copied by second kernel.
- Now when crash happens, second kernel should have all the required
info for reading symbols from first kernel's physical memory, no?

>
> NUMBER(number) = read_vmcoreinfo_ulong(STR_NUMBER(str_number))
>
> Yanjiang
>
>>
>> Once you know the real PHYS_OFFSET (which could have been random if KASLR is
>> enabled), you can fix the problem you are seeing.
>
> I have both validated with/without KASLR,  all of them worked well after 
> applying my patch.

IMHO, even if that works it does not mean that its good a fix. We
should try to find root cause. Moreover, you might not have /dev/mem
available for all the configuration where KASLR is enabled.

Regards
Pratyush

_______________________________________________
kexec mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/kexec

Reply via email to