Hi,

Sorry to reply late due to a long festival.

I have tested this patch against v4.15 and latest kernel with small
modification to meet the situation we discussed here. Both work fine.

The below is the modification of two kernel

test1. latest kernel with two extra modification to expose the problem
-1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c
(mm/sparse.c: reset section's mem_map when fully deactivated), this
commit work around this bug
-1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
write correct address of mem_section into vmcoreinfo"). This will create
a buggy situation as we discussed here.
-1.3. fix building bug due to revert
a0b1280368d1e91ab72f849ef095b4f07a39bbf1

test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1.
-2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
write correct address of mem_section into vmcoreinfo")

So I can not see any problem with my patch.
Maybe I misunderstand the discussion, but I can not see my original
patch will break the kernel which have 83e3c48729d9 but not a0b1280368d1.

Thanks,
Pingfan

On 01/29/2020 03:33 AM, Thadeu Lima de Souza Cascardo wrote:
> On Tue, Jan 28, 2020 at 05:03:12PM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote:
>> Hi Cascardo,
>>
>>> -----Original Message-----
>>> On Mon, Jan 27, 2020 at 02:04:54PM -0300, Thadeu Lima de Souza Cascardo 
>>> wrote:
>>>> Sorry for taking too long to respond, as I was on vacation.
>>>>
>>>> The kernels that had commit 83e3c48729d9, but not commit a0b1280368d1, are
>>>> not supported anymore. In a way that it's even hard for me to test them.
>>>>
>>>> However, I managed to test it, and those two lines are definitively needed
>>>> to dump a VM running such a kernel. Is removing them really needed to fix
>>>> this issue?
>>>>
>>>> Otherwise, I would rather keep them.
>>>>
>>>> Thanks.
>>>> Cascardo.
>>>
>>> By the way, I was too fast in sending this. We really need to keep those 
>>> lines
>>> as makedumpfile will fail to dump a 4.4 kernel with this patch as is.
>>
>> Is that Ubuntu 4.4 kernel which has 83e3c48729d9 and not a0b1280368d1?
>> Could you elaborate on how it fails?
> 
> No, it doesn't have either, so my guess is it would fail on upstream 4.4 as
> well, so anything that doesn't have 83e3c48729d9.
> 
> That's what I get on that 4.4 kernel (4.4.0-171-generic):
> 
> # ./makedumpfile /proc/vmcore ../dump
> get_mem_section: Could not validate mem_section.
> get_mm_sparsemem: Can't get the address of mem_section.
> 
> makedumpfile Failed.
> #
> 
> So, now, I have a better grasp of the whole logic, and understand why it fails
> with this patch.
> 
> So, we need to either interpret the mem_section as a pointer to the array of a
> pointer to the pointer to the array. The only case the second option is valid
> is when sparse_extreme is on, so we don't even need to check the second case
> when it's off.
> 
> Then, we check that interpreting it either way is valid. If it's valid in both
> interpretations, we can't decide which to use, and will fail. So far, we
> haven't seen any case in the field where that would accidentally happen. But 
> in
> case it does, we should rather fail to dump and fallback to copying, than
> creating a bogus compressed dump.
> 
> When this patch is applied, a kernel which does not have 83e3c48729d9, and
> thus, has mem_section as a direct pointer to the array, it so happens that we
> don't detect the pointer to pointer to the array case as invalid, thus failing
> to dump.
> 
> The way we validate is that the mem_maps should either have the PRESENT bit or
> be NULL. Now, that assumption may be invalid, and we would need to do the
> validation some other way. I can test the cases where that assumption is
> invalid in a 4.4 kernel and see how to fix this in a satisfactory way.
> 
> Going through the code once again, I don't see how the second section of the
> patch would be correct by itself too. I will think it through a little more 
> and
> see if I can come up with a solution.
> 
> Regards.
> Cascardo.
> 
>>
>> I'm thinking that Pingfan's thought may help:
>>>> I think it could be if/else, no need to call twice.
>>
>> Thanks,
>> Kazu
>>
>>>
>>> Cascardo.
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

Reply via email to