On 08/10/2018 08:21 AM, Srikar Dronamraju wrote: > * Michael Ellerman <m...@ellerman.id.au> [2018-08-10 21:42:28]: > >> Srikar Dronamraju <sri...@linux.vnet.ibm.com> writes: >>> diff --git a/arch/powerpc/include/asm/topology.h >>> b/arch/powerpc/include/asm/topology.h >>> index 16b077801a5f..70f2d2285ba7 100644 >>> --- a/arch/powerpc/include/asm/topology.h >>> +++ b/arch/powerpc/include/asm/topology.h >>> @@ -113,6 +114,10 @@ static inline int timed_topology_update(int nsecs) >>> { >>> return 0; >>> } >>> + >>> +#ifdef CONFIG_SMP >>> +static inline void check_topology_updates(void) {} >>> +#endif >> I realise that's only called from smp.c but it doesn't really need to be >> inside #ifdef CONFIG_SMP, it's harmless to have an empty version for >> SMP=n. >> > I did that just to fix > https://openpower.xyz/job/snowpatch/job/snowpatch-linux-sparse/577// > > >>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c >>> index 4794d6b4f4d2..2aa0ffd954c9 100644 >>> --- a/arch/powerpc/kernel/smp.c >>> +++ b/arch/powerpc/kernel/smp.c >>> @@ -1156,6 +1156,12 @@ void __init smp_cpus_done(unsigned int max_cpus) >>> if (smp_ops && smp_ops->bringup_done) >>> smp_ops->bringup_done(); >>> >>> + /* >>> + * On a shared LPAR, associativity needs to be requested. >>> + * Hence, check for numa topology updates before dumping >>> + * cpu topology >>> + */ >>> + check_topology_updates(); >> I get that you're calling it here because you want to get it correct >> before we do the dump below, but we could move the dump later also. >> >> You mention we need to do the check before the sched domains are >> initialised, but where exactly does that happen? > Almost immediately after smp_cpus_done, > > kernel_init_freeable() calls smp_init() and sched_init_smp() back to > back. > > smp_init() calls smp_cpus_done() > sched_init_smp() calls sched_init_numa() and sched_init_domains() > > sched_init_numa() initialises sched_domains_numa_masks which is the > cpumask that matters the most in our discussions. > sched_init_domains initialised the sched_domain. > > So I dont think we can move any later. > >>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >>> index 0c7e05d89244..32c13a208589 100644 >>> --- a/arch/powerpc/mm/numa.c >>> +++ b/arch/powerpc/mm/numa.c >>> @@ -1515,6 +1515,7 @@ int start_topology_update(void) >>> lppaca_shared_proc(get_lppaca())) { >>> if (!vphn_enabled) { >>> vphn_enabled = 1; >>> + topology_update_needed = 1; >> I really don't understand what topology_update_needed is for. >> We set it here. > topology_update_needed is even set by numa_update_cpu_topology which > mighet get called from rtasd. Yes. While looking at problems with Shared CPU Topology updates for NUMA last year, I observed that numa_update_cpu_topology was being called before topology_update_init(), and thus, the next call to numa_update_cpu_topology did not correctly initialize the topology based upon VPHN. > >>> setup_cpu_associativity_change_counters(); >>> timer_setup(&topology_timer, topology_timer_fn, >>> TIMER_DEFERRABLE); >>> @@ -1551,6 +1552,19 @@ int prrn_is_enabled(void) >>> return prrn_enabled; >>> } >>> >>> +void __init check_topology_updates(void) >>> +{ >>> + /* Do not poll for changes if disabled at boot */ >>> + if (topology_updates_enabled) >>> + start_topology_update(); >> Here we call start_topology_update(), which might set >> topology_update_needed to 1. >> >>> + >>> + if (topology_update_needed) { >> And then we check it here? >> Why is this logic not *in* start_topology_update() ? > I have split topology_update_init into two parts. First part being > check_topology_update() and the next part being topology_update_init(). > > By the time the current topology_update_init() gets called, > sched_domains are already created. So we need to move this part before. > > However the scheduled topology updates because of vphn can wait, atleast > that how the current code is written. topology_inited indicates if the > scheduled topology updates have been setup. > >>> + bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask), >>> + nr_cpumask_bits); >>> + numa_update_cpu_topology(false); >>> + } >>> +} >> I'm not really clear on what check_topology_update() is responsible for >> doing vs topology_update_init(). Why do we need both? > I am okay to move the complete topology_update_init above and delete the > device_initcall. But then we would be moving the polling of vphn events > to a little bit earlier stage in the initialisation. > >>> @@ -1608,10 +1618,6 @@ static int topology_update_init(void) >>> return -ENOMEM; >>> >>> topology_inited = 1; >> What does that mean? Didn't we already do the init earlier? > If there is an explicit request for topology update via > numa_update_cpu_topology() even before we looked at the feature flags > and set vphn_enabled/prrn_enabled, then we would defer handle it till > the feature flags are checked. topology_inited helps us in achieving > this. Yes. > >>> - if (topology_update_needed) >>> - bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask), >>> - nr_cpumask_bits); >>> - >>> return 0; >>> } >>> device_initcall(topology_update_init); > Thanks for the review. > Looks good.
-- ---------- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 mbri...@us.ibm.com