On Wed, Oct 23, 2013 at 05:01:32PM -0400, kosaki.motoh...@gmail.com wrote:
> From: KOSAKI Motohiro <kosaki.motoh...@jp.fujitsu.com>
> 
> Yasuaki Ithimatsu reported memory hot-add spent more than 5 _hours_
> on 9TB memory machine and we found out setup_zone_migrate_reserve
> spnet >90% time.
> 
> The problem is, setup_zone_migrate_reserve scan all pageblock
> unconditionally, but it is only necessary number of reserved block
> was reduced (i.e. memory hot remove).
> Moreover, maximum MIGRATE_RESERVE per zone are currently 2. It mean,
> number of reserved pageblock are almost always unchanged.
> 
> This patch adds zone->nr_migrate_reserve_block to maintain number
> of MIGRATE_RESERVE pageblock and it reduce an overhead of
> setup_zone_migrate_reserve dramatically.
> 

It seems regrettable to expand the size of struct zone just for this.
You are right that the number of blocks does not exceed 2 because of a
check made in setup_zone_migrate_reserve so it should be possible to
special case this. I didn't test this or think about it particularly
carefully and no doubt there is a nicer way but for illustration
purposes see the patch below.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dd886fa..1aedddd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3897,6 +3897,8 @@ static int pageblock_is_reserved(unsigned long start_pfn, 
unsigned long end_pfn)
        return 0;
 }
 
+#define MAX_MIGRATE_RESERVE_BLOCKS 2
+
 /*
  * Mark a number of pageblocks as MIGRATE_RESERVE. The number
  * of blocks reserved is based on min_wmark_pages(zone). The memory within
@@ -3910,6 +3912,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
        struct page *page;
        unsigned long block_migratetype;
        int reserve;
+       int found = 0;
 
        /*
         * Get the start pfn, end pfn and the number of blocks to reserve
@@ -3926,11 +3929,11 @@ static void setup_zone_migrate_reserve(struct zone 
*zone)
        /*
         * Reserve blocks are generally in place to help high-order atomic
         * allocations that are short-lived. A min_free_kbytes value that
-        * would result in more than 2 reserve blocks for atomic allocations
-        * is assumed to be in place to help anti-fragmentation for the
-        * future allocation of hugepages at runtime.
+        * would result in more than MAX_MIGRATE_RESERVE_BLOCKS reserve blocks
+        * for atomic allocations is assumed to be in place to help
+        * anti-fragmentation for the future allocation of hugepages at runtime.
         */
-       reserve = min(2, reserve);
+       reserve = min(MAX_MIGRATE_RESERVE_BLOCKS, reserve);
 
        for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
                if (!pfn_valid(pfn))
@@ -3956,6 +3959,7 @@ static void setup_zone_migrate_reserve(struct zone *zone)
                        /* If this block is reserved, account for it */
                        if (block_migratetype == MIGRATE_RESERVE) {
                                reserve--;
+                               found++;
                                continue;
                        }
 
@@ -3970,6 +3974,10 @@ static void setup_zone_migrate_reserve(struct zone *zone)
                        }
                }
 
+               /* If all possible reserve blocks have been found, we're done */
+               if (found >= MAX_MIGRATE_RESERVE_BLOCKS)
+                       break;
+
                /*
                 * If the reserve is met and this is a previous reserved block,
                 * take it back

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to