On Thu, Nov 19, 2020 at 06:25:29PM +0100, Nicolas Saenz Julienne wrote:
> On Thu, 2020-11-19 at 17:10 +0000, Catalin Marinas wrote:
> > On Thu, Nov 19, 2020 at 03:09:58PM +0100, Nicolas Saenz Julienne wrote:
> > > On Fri, 2020-11-13 at 11:29 +0000, Catalin Marinas wrote:
> > > [...]
> > > > > > > Let me stress that knowing the DMA constraints in the system 
> > > > > > > before reserving
> > > > > > > crashkernel's regions is necessary if we ever want it to work 
> > > > > > > seamlessly on all
> > > > > > > platforms. Be it small stuff like the Raspberry Pi or huge 
> > > > > > > servers with TB of
> > > > > > > memory.
> > > > > > 
> > > > > > Indeed. So we have 3 options (so far):
> > > > > > 
> > > > > > 1. Allow the crashkernel reservation to go into the linear map but 
> > > > > > set
> > > > > >    it to invalid once allocated.
> > > > > > 
> > > > > > 2. Parse the flattened DT (not sure what we do with ACPI) before
> > > > > >    creating the linear map. We may have to rely on some SoC ID here
> > > > > >    instead of actual DMA ranges.
> > > > > > 
> > > > > > 3. Assume the smallest ZONE_DMA possible on arm64 (1GB) for 
> > > > > > crashkernel
> > > > > >    reservations and not rely on arm64_dma_phys_limit in
> > > > > >    reserve_crashkernel().
> > > > > > 
> > > > > > I think (2) we tried hard to avoid. Option (3) brings us back to the
> > > > > > issues we had on large crashkernel reservations regressing on some
> > > > > > platforms (though it's been a while since, they mostly went quiet 
> > > > > > ;)).
> > > > > > However, with Chen's crashkernel patches we end up with two
> > > > > > reservations, one in the low DMA zone and one higher, potentially 
> > > > > > above
> > > > > > 4GB. Having a fixed 1GB limit wouldn't be any worse for crashkernel
> > > > > > reservations than what we have now.
> > > > > > 
> > > > > > If (1) works, I'd go for it (James knows this part better than me),
> > > > > > otherwise we can go for (3).
> > > > > 
> > > > > Overall, I'd prefer (1) as well, and I'd be happy to have a got at 
> > > > > it. If not
> > > > > I'll append (3) in this series.
> > > > 
> > > > I think for 1 we could also remove the additional KEXEC_CORE checks,
> > > > something like below, untested:
> > > > 
> > > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > > > index 3e5a6913acc8..27ab609c1c0c 100644
> > > > --- a/arch/arm64/mm/mmu.c
> > > > +++ b/arch/arm64/mm/mmu.c
> > > > @@ -477,7 +477,8 @@ static void __init map_mem(pgd_t *pgdp)
> > > >         int flags = 0;
> > > >         u64 i;
> > > >  
> > > > -       if (rodata_full || debug_pagealloc_enabled())
> > > > +       if (rodata_full || debug_pagealloc_enabled() ||
> > > > +           IS_ENABLED(CONFIG_KEXEC_CORE))
> > > >                 flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> > > >  
> > > >         /*
> > > > @@ -487,11 +488,6 @@ static void __init map_mem(pgd_t *pgdp)
> > > >          * the following for-loop
> > > >          */
> > > >         memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
> > > > -#ifdef CONFIG_KEXEC_CORE
> > > > -       if (crashk_res.end)
> > > > -               memblock_mark_nomap(crashk_res.start,
> > > > -                                   resource_size(&crashk_res));
> > > > -#endif
> > > >  
> > > >         /* map all the memory banks */
> > > >         for_each_mem_range(i, &start, &end) {
> > > > @@ -518,21 +514,6 @@ static void __init map_mem(pgd_t *pgdp)
> > > >         __map_memblock(pgdp, kernel_start, kernel_end,
> > > >                        PAGE_KERNEL, NO_CONT_MAPPINGS);
> > > >         memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
> > > > -
> > > > -#ifdef CONFIG_KEXEC_CORE
> > > > -       /*
> > > > -        * Use page-level mappings here so that we can shrink the region
> > > > -        * in page granularity and put back unused memory to buddy 
> > > > system
> > > > -        * through /sys/kernel/kexec_crash_size interface.
> > > > -        */
> > > > -       if (crashk_res.end) {
> > > > -               __map_memblock(pgdp, crashk_res.start, crashk_res.end + 
> > > > 1,
> > > > -                              PAGE_KERNEL,
> > > > -                              NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
> > > > -               memblock_clear_nomap(crashk_res.start,
> > > > -                                    resource_size(&crashk_res));
> > > > -       }
> > > > -#endif
> > > >  }
> > > >  
> > > >  void mark_rodata_ro(void)
> > > 
> > > So as far as I'm concerned this is good enough for me. I took the time to
> > > properly test crashkernel on RPi4 using the series, this patch, and 
> > > another
> > > small fix to properly update /proc/iomem.
> > > 
> > > I'll send v7 soon, but before, James (or anyone for that matter) any 
> > > obvious
> > > push-back to Catalin's solution?
> > 
> > I talked to James earlier and he was suggesting that we check the
> > command line for any crashkernel reservations and only disable block
> > mappings in that case, see the diff below on top of the one I already
> > sent (still testing it).
> 
> That's even better :)
> 
> > If you don't have any other changes for v7, I'm happy to pick v6 up on
> > top of the no-block-mapping fix.
> 
> Yes I've got a small change in patch #1, the crashkernel reservation has to be
> performed before request_standart_resouces() is called, which is OK, since
> we're all setup by then, I moved the crashkernel reservation at the end of
> bootmem_init(). I attached the patch. If it's easier for you I'll send v7.

Please send a v7, otherwise b4 gets confused.

Thanks.

-- 
Catalin
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to