On Fri, Jun 27, 2014 at 08:25:37PM +0100, Mel Gorman wrote:
> On Fri, Jun 27, 2014 at 02:57:00PM -0400, Johannes Weiner wrote:
> > On Fri, Jun 27, 2014 at 09:14:39AM +0100, Mel Gorman wrote:
> > > And the number of pages allocated from each zone is comparable
> > > 
> > >                             3.16.0-rc2  3.16.0-rc2
> > >                               checklow    fairzone
> > > DMA allocs                           0           0
> > > DMA32 allocs                   7374217     7920241
> > > Normal allocs                999277551   996568115
> > 
> > Wow, the DMA32 zone gets less than 1% of the allocations.  What are
> > the zone sizes in this machine?
> > 
> 
>         managed  3976
>         managed  755409
>         managed  1281601

Something seems way off with this.  On my system here, the DMA32 zone
makes up for 20% of managed pages and it gets roughly 20% of the page
allocations, as I would expect.

Your DMA32 zone makes up for 37% of the managed pages and receives
merely 0.7% of the page allocations.  Unless a large portion of that
zone is somehow unreclaimable, fairness seems completely obliberated
in both kernels.

Is that checklow's doing?

> > > @@ -3287,10 +3287,18 @@ void show_free_areas(unsigned int filter)
> > >   show_swap_cache_info();
> > >  }
> > >  
> > > -static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
> > > +static int zoneref_set_zone(pg_data_t *pgdat, struct zone *zone,
> > > +                 struct zoneref *zoneref, struct zone *preferred_zone)
> > >  {
> > > + int zone_type = zone_idx(zone);
> > > + bool fair_enabled = zone_local(zone, preferred_zone);
> > > + if (zone_type == 0 &&
> > > +                 zone->managed_pages < (pgdat->node_present_pages >> 4))
> > > +         fair_enabled = false;
> > 
> > This needs a comment.
> > 
> 
>         /*
>          * Do not count the lowest zone as of relevance to the fair zone
>          * allocation policy if it's a small percentage of the node
>          */
> 
> However, as I write this I'll look at getting rid of this entirely. It
> made some sense when fair_eligible was tracked on a per-zone basis but
> it's more complex than necessary.
>
> > >   zoneref->zone = zone;
> > > - zoneref->zone_idx = zone_idx(zone);
> > > + zoneref->zone_idx = zone_type;
> > > + return fair_enabled;
> > >  }
> > >  
> > >  /*
> > > @@ -3303,17 +3311,26 @@ static int build_zonelists_node(pg_data_t *pgdat, 
> > > struct zonelist *zonelist,
> > >  {
> > >   struct zone *zone;
> > >   enum zone_type zone_type = MAX_NR_ZONES;
> > > + struct zone *preferred_zone = NULL;
> > > + int nr_fair = 0;
> > >  
> > >   do {
> > >           zone_type--;
> > >           zone = pgdat->node_zones + zone_type;
> > >           if (populated_zone(zone)) {
> > > -                 zoneref_set_zone(zone,
> > > -                         &zonelist->_zonerefs[nr_zones++]);
> > > +                 if (!preferred_zone)
> > > +                         preferred_zone = zone;
> > > +
> > > +                 nr_fair += zoneref_set_zone(pgdat, zone,
> > > +                         &zonelist->_zonerefs[nr_zones++],
> > > +                         preferred_zone);
> > 
> > Passing preferred_zone to determine locality seems pointless when you
> > walk the zones of a single node.
> > 
> 
> True.
> 
> > And the return value of zoneref_set_zone() is fairly unexpected.
> > 
> 
> How so?

Given the name zoneref_set_zone(), I wouldn't expect any return value,
or a success/failure type return value at best - certainly not whether
the passed zone is eligible for the fairness policy.

> > It's probably better to determine fair_enabled in the callsite, that
> > would fix both problems, and write a separate helper that tests if a
> > zone is eligible for fair treatment (type && managed_pages test).
> > 
> 
> Are you thinking of putting that into the page allocator fast path? I'm
> trying to take stuff out of there :/.

Not at all, I was just suggesting to restructure the code for building
the zonelists, and move the fairness stuff out of zoneref_set_zone().

If you remove the small-zone exclusion as per above, this only leaves
the locality check when building the zonelist in zone order and that
can easily be checked inline in build_zonelists_in_zone_order().

build_zonelists_node() can just count every populated zone in nr_fair.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to