On Mon, 6 Sep 2010 00:57:53 +0900
Minchan Kim <minchan....@gmail.com> wrote: 
> > 
> > Thanks,
> > -Kame
> > ==
> > From: KAMEZAWA Hiroyuki <kamezawa.hir...@jp.fujitsu.com>
> > 
> > This patch as a memory allocator for contiguous memory larger than 
> > MAX_ORDER.
> > 
> >   alloc_contig_pages(hint, size, list);
> > 
> >   This function allocates 'size' of contigoues pages, whose physical address
> >   is higher than 'hint'. size is specicied in byte unit.
> 
> size is byte, hint is pfn?
> 

hint is physical address. What's annoying me is x86-32, should I use physaddr_t 
or
pfn ....

> >   Allocated pages are all linked into the list and all of their page_count()
> >   are set to 1. Return value is the top page. 
> > 
> >  free_contig_pages(list)
> >  returns all pages in the list.
> > 
> > This patch does
> >   - find an area which can be ISOLATED.
> >   - migrate remaining pages in the area.
> 
> Migrate from there to where?
> 
somewhere.

> >   - steal chunk of pages from allocator.
> > 
> > Limitation is:
> >   - retruned pages will be aligend to MAX_ORDER.
> >   - returned length of page will be aligned to MAX_ORDER.
> >     (so, the caller may have to return tails of pages by itself.)
> 
> What do you mean tail?
> 
Ah, the allocator returns MAX_ORDER aligned pages, then,

  [xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyyyyyy]
  x+y = allocated
  x = will be used.
  y = will be unsused.

I call 'y' as tail, here.


> >   - may allocate contiguous pages which overlap node/zones.
> 
> Hmm.. Do we really need this?
> 
Unnecessary. please consider this as BUG.

This code just check pfn of allocated area but doesn't check which
zone/node the pfn is tied to.

For example, I hear IBM has following kind of memory layout.

  | Node0 | Node1 | Node2 | Node0 | Node2 | Node1| .....

So, some check should be added to avoid to allocate chunk of pages
spreads out to multiple nodes.

(I hope walk_page_range() can do enough jobs for us, but I'm not sure.
 I need to add "zone" check, at least)




> > 
> > This is fully experimental and written as example.
> > (Maybe need more patches to make this complete.)
> 
> Yes. But first impression of this patch is good to me. 
> 
> > 
> > This patch moves some amount of codes from memory_hotplug.c to
> > page_isolation.c and based on page-offline technique used by
> > memory_hotplug.c 
> > 
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hir...@jp.fujitsu.com>
> > ---
> >  include/linux/page-isolation.h |   10 +
> >  mm/memory_hotplug.c            |   84 --------------
> >  mm/page_alloc.c                |   32 +++++
> >  mm/page_isolation.c            |  244 
> > +++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 287 insertions(+), 83 deletions(-)
> > 
> > Index: mmotm-0827/mm/page_isolation.c
> > ===================================================================
> > --- mmotm-0827.orig/mm/page_isolation.c
> > +++ mmotm-0827/mm/page_isolation.c
> > @@ -3,8 +3,11 @@
> >   */
> >  
> >  #include <linux/mm.h>
> > +#include <linux/swap.h>
> >  #include <linux/page-isolation.h>
> >  #include <linux/pageblock-flags.h>
> > +#include <linux/mm_inline.h>
> > +#include <linux/migrate.h>
> >  #include "internal.h"
> >  
> >  static inline struct page *
> > @@ -140,3 +143,244 @@ int test_pages_isolated(unsigned long st
> >     spin_unlock_irqrestore(&zone->lock, flags);
> >     return ret ? 0 : -EBUSY;
> >  }
> > +
> > +#define CONTIG_ALLOC_MIGRATION_RETRY       (5)
> > +
> > +/*
> > + * Scanning pfn is much easier than scanning lru list.
> > + * Scan pfn from start to end and Find LRU page.
> > + */
> > +unsigned long scan_lru_pages(unsigned long start, unsigned long end)
> > +{
> > +   unsigned long pfn;
> > +   struct page *page;
> > +   for (pfn = start; pfn < end; pfn++) {
> > +           if (pfn_valid(pfn)) {
> > +                   page = pfn_to_page(pfn);
> > +                   if (PageLRU(page))
> > +                           return pfn;
> > +           }
> > +   }
> > +   return 0;
> > +}
> > +
> > +/* Migrate all LRU pages in the range to somewhere else */
> > +static struct page *
> > +hotremove_migrate_alloc(struct page *page, unsigned long private, int **x)
> > +{
> > +   /* This should be improooooved!! */
> 
> Yeb. 
> 
> > +   return alloc_page(GFP_HIGHUSER_MOVABLE);
> > +}
> 
> <snip>
> 
> > +struct page *alloc_contig_pages(unsigned long long hint,
> > +           unsigned long size, struct list_head *list)
> > +{
> > +   unsigned long base, found, end, pages, start;
> > +   struct page *ret = NULL;
> > +   int nid, retry;
> > +
> > +   if (hint)
> > +           hint = ALIGN(hint, MAX_ORDER_NR_PAGES);
> > +   /* request size should be aligned to pageblock */
> > +   size >>= PAGE_SHIFT;
> > +   pages = ALIGN(size, MAX_ORDER_NR_PAGES);
> > +   found = 0;
> > +retry:
> > +   for_each_node_state(nid, N_HIGH_MEMORY) {
> > +           unsigned long node_end;
> > +           pg_data_t *node = NODE_DATA(nid);
> > +
> > +           node_end = node->node_start_pfn + node->node_spanned_pages;
> > +           /* does this node have proper range of memory ? */
> > +           if (node_end < hint + pages)
> > +                   continue;
> > +           base = hint;
> > +           if (base < node->node_start_pfn)
> > +                   base = node->node_start_pfn;
> > +
> > +           base = ALIGN(base, MAX_ORDER_NR_PAGES);
> > +           found = 0;
> > +           end = node_end & ~(MAX_ORDER_NR_PAGES -1);
> > +           /* Maybe we can use this Node */
> > +           if (base + pages < end)
> > +                   found = __find_contig_block(base, end, pages);
> > +           if (found) /* Found ? */
> > +                   break;
> > +           base = hint;
> > +   }
> > +   if (!found)
> > +           goto out;
> > +   /*
> > +    * Ok, here, we have contiguous pageblock marked as "isolated"
> > +    * try migration.
> > +    */
> > +   retry = CONTIG_ALLOC_MIGRATION_RETRY;
> > +   end = found + pages;
> 
> Hmm.. I can't understand below loop. 
> Maybe need refactoring.
> 
yes. Just moved from memory hotplug code.




> > +   for (start = scan_lru_pages(found, end); start < end;) {
> > +
> > +           if (do_migrate_range(found, end)) {
> > +                   /* migration failure ... */
> > +                   if (retry-- < 0)
> > +                           break;
> > +                   /* take a rest and synchronize LRU etc. */
> > +                   lru_add_drain_all();
> > +                   flush_scheduled_work();
> > +                   cond_resched();
> > +                   drain_all_pages();
> > +           }
> > +           start = scan_lru_pages(start, end);
> > +           if (!start)
> > +                   break;
> > +   }
> 
> <snip>
> 
> > +void alloc_contig_freed_pages(unsigned long pfn,  unsigned long end,
> > +   struct list_head *list)
> > +{
> > +   struct page *page;
> > +   struct zone *zone;
> > +   int i, order;
> > +
> > +   zone = page_zone(pfn_to_page(pfn));
> > +   spin_lock_irq(&zone->lock);
> > +   while (pfn < end) {
> > +           VM_BUG_ON(!pfn_valid(pfn));
> > +           page = pfn_to_page(pfn);
> > +           VM_BUG_ON(page_count(page));
> > +           VM_BUG_ON(!PageBuddy(page));
> > +           list_del(&page->lru);
> > +           order = page_order(page);
> > +           zone->free_area[order].nr_free--;
> > +           rmv_page_order(page);
> > +           __mod_zone_page_state(zone, NR_FREE_PAGES, - (1UL << order));
> > +           for (i = 0;i < (1 << order); i++) {
> > +                   struct page *x = page + i;
> > +                   list_add(&x->lru, list);
> > +           }
> > +           page += 1 << order;
>                 ^ pfn?
> 
yes, the patch on my tree has "pfn"....refresh miss :(.


> > +   }
> > +   spin_unlock_irq(&zone->lock);
> > +
> > +   /*After this, pages on the list can be freed one be one */
> > +   list_for_each_entry(page, list, lru)
> > +           prep_new_page(page, 0, 0);
> > +}
> > +
> >  #ifdef CONFIG_MEMORY_HOTREMOVE
> >  /*
> >   * All pages in the range must be isolated before calling this.
> > 
> 

Thank you for review.

Regards,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to