On 04/09/2020 03:26 PM, David Hildenbrand wrote:
> On 09.04.20 04:59, piliu wrote:
>>
>>
>> On 04/08/2020 10:46 AM, Baoquan He wrote:
>>> Add Pingfan to CC since he usually handles ppc related bugs for RHEL.
>>>
>>> On 04/07/20 at 03:54pm, David Hildenbrand wrote:
>>>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
>>>> blocks as removable"), the user space interface to compute whether a memory
>>>> block can be offlined (exposed via
>>>> /sys/devices/system/memory/memoryX/removable) has effectively been
>>>> deprecated. We want to remove the leftovers of the kernel implementation.
>>>
>>> Pingfan, can you have a look at this change on PPC?  Please feel free to
>>> give comments if any concern, or offer ack if it's OK to you.
>>>
>>>>
>>>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
>>>> we'll start by:
>>>> 1. Testing if it contains any holes, and reject if so
>>>> 2. Testing if pages belong to different zones, and reject if so
>>>> 3. Isolating the page range, checking if it contains any unmovable pages
>>>>
>>>> Using is_mem_section_removable() before trying to offline is not only racy,
>>>> it can easily result in false positives/negatives. Let's stop manually
>>>> checking is_mem_section_removable(), and let device_offline() handle it
>>>> completely instead. We can remove the racy is_mem_section_removable()
>>>> implementation next.
>>>>
>>>> We now take more locks (e.g., memory hotplug lock when offlining and the
>>>> zone lock when isolating), but maybe we should optimize that
>>>> implementation instead if this ever becomes a real problem (after all,
>>>> memory unplug is already an expensive operation). We started using
>>>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
>>>> Implement memory hotplug remove in the kernel"), with the initial
>>>> hotremove support of lmbs.
>>>>
>>>> Cc: Nathan Fontenot <nf...@linux.vnet.ibm.com>
>>>> Cc: Michael Ellerman <m...@ellerman.id.au>
>>>> Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
>>>> Cc: Paul Mackerras <pau...@samba.org>
>>>> Cc: Michal Hocko <mho...@suse.com>
>>>> Cc: Andrew Morton <a...@linux-foundation.org>
>>>> Cc: Oscar Salvador <osalva...@suse.de>
>>>> Cc: Baoquan He <b...@redhat.com>
>>>> Cc: Wei Yang <richard.weiy...@gmail.com>
>>>> Signed-off-by: David Hildenbrand <da...@redhat.com>
>>>> ---
>>>>  .../platforms/pseries/hotplug-memory.c        | 26 +++----------------
>>>>  1 file changed, 3 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
>>>> b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> index b2cde1732301..5ace2f9a277e 100644
>>>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct 
>>>> device_node *np)
>>>>  
>>>>  static bool lmb_is_removable(struct drmem_lmb *lmb)
>>>>  {
>>>> -  int i, scns_per_block;
>>>> -  bool rc = true;
>>>> -  unsigned long pfn, block_sz;
>>>> -  u64 phys_addr;
>>>> -
>>>>    if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>>>>            return false;
>>>>  
>>>> -  block_sz = memory_block_size_bytes();
>>>> -  scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>>>> -  phys_addr = lmb->base_addr;
>>>> -
>>>>  #ifdef CONFIG_FA_DUMP
>>>>    /*
>>>>     * Don't hot-remove memory that falls in fadump boot memory area
>>>>     * and memory that is reserved for capturing old kernel memory.
>>>>     */
>>>> -  if (is_fadump_memory_area(phys_addr, block_sz))
>>>> +  if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
>>>>            return false;
>>>>  #endif
>>>> -
>>>> -  for (i = 0; i < scns_per_block; i++) {
>>>> -          pfn = PFN_DOWN(phys_addr);
>>>> -          if (!pfn_in_present_section(pfn)) {
>>>> -                  phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>>> -                  continue;
>>>> -          }
>>>> -
>>>> -          rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
>>>> -          phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>>> -  }
>>>> -
>>>> -  return rc;
>>>> +  /* device_offline() will determine if we can actually remove this lmb */
>>>> +  return true;
>> So I think here swaps the check and do sequence. At least it breaks
>> dlpar_memory_remove_by_count(). It is doable to remove
>> is_mem_section_removable(), but here should be more effort to re-arrange
>> the code.
>>
> 
> Thanks Pingfan,
> 
> 1. "swaps the check and do sequence":
> 
> Partially. Any caller of dlpar_remove_lmb() already has to deal with
> false positives. device_offline() can easily fail after
> dlpar_remove_lmb() == true. It's inherently racy.
> 
> 2. "breaks dlpar_memory_remove_by_count()"
> 
> Can you elaborate why it "breaks" it? It will simply try to
> offline+remove lmbs, detect that it wasn't able to offline+remove as
> much as it wanted (which could happen before as well easily), and re-add
> the already offlined+removed ones.
> 
I overlooked the re-add logic. Then I think
dlpar_memory_remove_by_count() is OK with this patch.
> 3. "more effort to re-arrange the code"
> 
> What would be your suggestion?
> 
I had thought about merging the two loop "for_each_drmem_lmb()", and do
check inside the loop. But now it is needless.

The only concerned left is "if (lmbs_available < lmbs_to_remove)" fails
to alarm due to the weaken checking in lmb_is_removable(). Then after
heavy migration in offline_pages, we encounters this limit, and need to
re-add them back.

But I think it is a rare case plus hot-remove is also not a quite
frequent event. So it is worth to simplify the code by this patch.

Thanks for your classification.

For [1/2]
Reviewed-by: Pingfan Liu <pi...@redhat.com>

Reply via email to