On 11/15/2017 12:28 PM, Michael Bringmann wrote: > Hello: > See below. > > On 10/16/2017 07:54 AM, Michael Ellerman wrote: >> Michael Bringmann <m...@linux.vnet.ibm.com> writes: >> >>> powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU, >>> it may occur that the new resources are to be inserted into nodes >>> that were not used for memory resources at bootup. Many different >>> configurations of PowerPC resources may need to be supported depending >>> upon the environment. >> >> Give me some detail please?! > > The most important characteristics that I have observed are: > > * Dedicated vs. shared resources. Shared resources require information > such as the VPHN hcall for CPU assignment to nodes. > * memoryless nodes at boot. Nodes need to be defined as 'possible' at > boot for operation with other code modules. Previously, the powerpc > code would limit the set of possible/online nodes to those which have > memory assigned at boot. Subsequent add/remove of CPUs or memory would > only work with this subset of possible nodes. > * memoryless nodes with CPUs at boot. Due to the previous restriction on > nodes, nodes that had CPUs but no memory were being collapsed into other > nodes that did have memory at boot. In practice this meant that the > node assignment presented by the runtime kernel differed from the affinity > and associativity attirbutes presented by the device tree or VPHN hcalls. > Nodes that might be known to the pHyp were not 'possible' in the runtime > kernel because they did not have memory at boot. > >> >>> This patch fixes some problems encountered at >> >> What problems? > > This patch set fixes a couple of problems. > > * Nodes known to powerpc to be memoryless at boot, but to have CPUs in them > are allowed to be 'possible' and 'online'. Memory allocations for those > nodes are taken from another node that does have memory until and if memory > is hot-added to the node. > * Nodes which have no resources assigned at boot, but which may still be > referenced subsequently by affinity or associativity attributes, are kept > in the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs > to the system can reference these nodes and bring them online instead of > redirecting the resources to the set of nodes known to have memory at boot. > >> >>> runtime with configurations that support memory-less nodes, but which >>> allow CPUs to be added at and after boot. >> >> How does it fix those problems? > > This problem was fixed in a couple of ways. First, the code now checks > whether the node to which a CPU is mapped by 'numa_update_cpu_topology' / > 'arch_update_cpu_topology' has been initialized and has memory available. > If either test is false, a call is made to 'try_online_node()' to finish > the data structure initialization. Only if we are unable to initialize > the node at this point will the CPU node assignment be collapsed into an > existing node. After initialization by 'try_online_node()', calls to > 'local_memory_node' no longer crash for these memoryless nodes. > >> >>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >>> index b385cd0..e811dd1 100644 >>> --- a/arch/powerpc/mm/numa.c >>> +++ b/arch/powerpc/mm/numa.c >>> @@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu, >>> return rc; >>> } >>> >>> +static int verify_node_preparation(int nid) >>> +{ >> >> I would not expect a function called "verify" ... >> >>> + if ((NODE_DATA(nid) == NULL) || >>> + (NODE_DATA(nid)->node_spanned_pages == 0)) { >>> + if (try_online_node(nid)) >> >> .. to do something like online a node. > > We have changed the function name to 'find_cpu_nid'.
Ok, but I would still not expect 'find_cpu_nid' to online the node. > >> >>> + return first_online_node; >>> + } >>> + >>> + return nid; >>> +} >>> + >>> /* >>> * Update the CPU maps and sysfs entries for a single CPU when its NUMA >>> * characteristics change. This function doesn't perform any locking and is >>> @@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked) >>> /* Use associativity from first thread for all siblings */ >>> vphn_get_associativity(cpu, associativity); >>> new_nid = associativity_to_nid(associativity); >>> - if (new_nid < 0 || !node_online(new_nid)) >>> + if (new_nid < 0 || !node_possible(new_nid)) >>> new_nid = first_online_node; >>> >>> + new_nid = verify_node_preparation(new_nid); >> >> You're being called part-way through CPU hotplug here, are we sure it's >> safe to go and do memory hotplug from there? What's the locking >> situation? > > We are not doing memory hotplug. We are initializing a node that may be used > by CPUs or memory before it can be referenced as invalid by a CPU hotplug > operation. CPU hotplug operations are protected by a range of APIs including > cpu_maps_update_begin/cpu_maps_update_done, cpus_read/write_lock / > cpus_read/write_unlock, > device locks, and more. Memory hotplug operations, including try_online_node, > are protected by mem_hotplug_begin/mem_hotplug_done, device locks, and more. > In the case of CPUs being hot-added to a previously memoryless node, the > try_online_node operation occurs wholly within the CPU locks with no overlap. > Using HMC hot-add/hot-remove operations, I have been able to add and remove > CPUs to any possible node without failures. HMC operations involve a degree > self-serialization, though. For both memory and cpu DLPAR operations we hold the device hotplug lock. -Nathan > >> >> cheers >> >> >