On 20 March 2014 13:41, Dietmar Eggemann <dietmar.eggem...@arm.com> wrote: > On 19/03/14 16:22, Vincent Guittot wrote: >> We replace the old way to configure the scheduler topology with a new method >> which enables a platform to declare additionnal level (if needed). >> >> We still have a default topology table definition that can be used by >> platform >> that don't want more level than the SMT, MC, CPU and NUMA ones. This table >> can >> be overwritten by an arch which either wants to add new level where a load >> balance >> make sense like BOOK or powergating level or wants to change the flags >> configuration of some levels. >> >> For each level, we need a function pointer that returns cpumask for each cpu, >> a function pointer that returns the flags for the level and a name. Only >> flags >> that describe topology, can be set by an architecture. The current topology >> flags are: >> SD_SHARE_CPUPOWER >> SD_SHARE_PKG_RESOURCES >> SD_NUMA >> SD_ASYM_PACKING >> >> Then, each level must be a subset on the next one. The build sequence of the >> sched_domain will take care of removing useless levels like those with 1 CPU >> and those with the same CPU span and relevant information for load balancing >> than its child. > > The paragraph above contains important information to set this up > correctly, that's why it might be worth clarifying: > > - "next one" of sd means "child of sd" ?
It's the next one in the table so the parent in the sched_domain > - "subset" means really "subset" and not "proper subset" ? yes, it's really "subset" and not "proper subset" Vincent > > On TC2 w/ the following change in cpu_corepower_mask() > > const struct cpumask *cpu_corepower_mask(int cpu) > { > - return &cpu_topology[cpu].thread_sibling; > + return cpu_topology[cpu].socket_id ? > &cpu_topology[cpu].thread_sibling : > + &cpu_topology[cpu].core_sibling; > } > > I get this e.g. for CPU0,2: > > CPU0: cpu_corepower_mask=0-1 -> GMC is subset of MC > CPU0: cpu_coregroup_mask=0-1 > CPU0: cpu_cpu_mask=0-4 > > CPU2: cpu_corepower_mask=2 -> GMC is proper sunset of MC > CPU2: cpu_coregroup_mask=2-4 > CPU2: cpu_cpu_mask=0-4 > > I assume here that this is a correct set-up. > > The domain degenerate part: > > "useless levels like those with 1 CPU" ... that's the case for GMC level > for CPU2,3,4 > > The GMC level is destroyed because of the following code snippet in > sd_degenerate(): if (cpumask_weight(sched_domain_span(sd)) == 1) > > so that's fine. > > In case of CPU0,1 since GMC and MC have the same span, the code in > build_sched_groups() creates only one group for MC and that's why > pflags is altered in sd_parent_degenerate() to SD_WAKE_AFFINE (0x20) and > the if condition 'if (~cflags & pflags)' is not hit and > sd_parent_degenerate() finally returns 1 for MC. > > So the "those with the same CPU span and relevant information for load > balancing than its child." is not so easy to understand for me. Because > both levels have the same span we actually don't take the flags of the > parent into consideration which require at least 2 groups. > > So the TC2 example covers for me two corner cases: (1) The level I want > to get rid of only contains 1 CPU (GMC for CPU2,3,4) and (2) The span of > the parent level I want to get rid of (MC for CPU0,1) of is the same as > the span of the level which should stay. > > Are these two corner cases the only one supported here? If yes this has > to be stated somewhere, otherwise if somebody will try this approach on > a different topology, (s)he might be surprised. > > If we only consider SD_SHARE_POWERDOMAIN for the socket related level, > this works fine. > > I would like to test this on more platforms but I only have my TC2 > available :-) > > -- Dietmar > > [...] > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/