James,

On Wed, Apr 25, 2018 at 02:22:07PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 10:20, AKASHI Takahiro wrote:
> > On Tue, Apr 24, 2018 at 05:08:57PM +0100, James Morse wrote:
> >> On 16/04/18 11:08, AKASHI Takahiro wrote:
> >>> On Thu, Apr 12, 2018 at 05:01:52PM +0100, James Morse wrote:
> >>>> On 05/04/18 03:42, AKASHI Takahiro wrote:
> >>>>> On Mon, Apr 02, 2018 at 10:53:32AM +0900, AKASHI Takahiro wrote:
> >>>>>> either because
> >>>>>> a. new kernel (or initrd/dtb) may have been allocated on a NOMAP region
> >>>>>>    which are not suitable for usable memory, or
> >>>>>> b. new kernel (or initrd/dtb) may have been allocated on a reserved 
> >>>>>> region
> >>>>>>    whose contents can be overwritten.
> >>>>>>
> >>>>>> While we see (b) even today, (a) is a backward compatibility issue.
> >>>>
> >>>> (a) doesn't happen because request_standard_resources() checks
> >>>> memblock_is_nomap(), and reports those regions as 'reserved'.
> >>>
> >>> I might have confused you. The assumption here was that we adopt format 
> >>> (D),
> >>> where all NOMAP regions are sub nodes of "System RAM", but still use
> >>> the current kexec-tools.
> >>> As I said above, this will end up an un-expected behavior.
> >>
> >> I'd like to fix this without having to fix user-space at the same time. It 
> >> looks
> >> like no-one else has second level reserved regions,
> > 
> > This was my assumption when I sent out a patch to kexec-tools.
> 
> But this would still leave user-space that isn't updated broken.
> 
> 
> >>>>> # I don't know yet whether people are happy with this fix, and also have
> >>>>>   kernel patches for my other approaches. They are neither not much
> >>>>>   complicated.
> >>>>
> >>>> I don't think we should fix this in userspace, exporting all the
> >>>> memblock_reserved() regions as 'reserved' in /proc/iomem looks like the 
> >>>> right
> >>>> thing to do.
> >>>
> >>> Again, if you modify /proc/iomem, you have to update kexec-tools, too.
> >>
> >> If we squash the memblock_reserved() stuff down so it appears as a top 
> >> level
> >> 'reserved' region too, I don't think we do.
> > 
> > If I correctly understand, you're talking about my format (E).
> > As I said, it will fix the issue without modifying user-space, but
> > 
> > || This does not only look quite noisy but also ignores the fact that
> > || reserved regions are part of System RAM (or memblock.memory).
> 
> I agree its noisy, there are significantly more 'reserved' areas, but these 
> are
> all either nomap or memblock_reserved().
> 
> Why does it matter if a reserved-region is nomap or memblock_reserved()? Any 
> new
> kernel will learn the difference from the EFI memory map and make its own 
> decisions.

Yeah, kernel can do (though kernel won't look though system resources list
for this purpose anyway), what about kexec-like user applications?
It may want to seek /proc/iomem to identify all the *usable* memory on
the system, that is "System RAM", but doesn't care whether some range is
reserved or not (for some reason) yet does care !NOMAP.

> Kexec-tools only needs to know what it can overwrite without clobbering
> important data like the UEFI memory map, or the APCI tables covered by the
> linear map.
> 
> 
> >> This prevents the efi-memory-map
> >> being overwritten on kernels since kexec was merged.
> >>
> >> Its horribly fiddly to do this. The kernel code/data are special reserved
> >> regions that we already describe as a subset of system-ram, even though 
> >> they are
> >> both also fragments of a bigger memblock_reserved() block.
> > 
> > Actually, we don't have to avoid kernel code/data regions as copying
> > loaded data to the final destinations will be done at the very end of kexec.
> 
> For kexec yes, but that is the existing format of the file, which we shouldn't
> change, otherwise we break something else.

One trivial downside in this approach is that a secondary kernel will be
loaded at an address different from the one of current kernel.
While it is sane, it looks a bit odd that, every time kexec'ed, a new
kernel (code/data) is located back and forth :)

> 
> >> While we can walk memblock for regions that aren't reserved, allocating 
> >> memory
> >> in the loop changes what is reserved. That one O(N) walk ends up being 
> >> four...
> > 
> > At most O(n^2)?
> 
> I think for_each_free_mem_range() is smart enough not to do that. Patch 
> incoming...

Yes, my v9 of kexec_file patch makes use of it.

Thanks,
-Takahiro AKASHI


> 
> Thanks,
> 
> James
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to