Hello: See below. On 10/16/2017 07:54 AM, Michael Ellerman wrote: > Michael Bringmann <m...@linux.vnet.ibm.com> writes: > >> powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU, >> it may occur that the new resources are to be inserted into nodes >> that were not used for memory resources at bootup. Many different >> configurations of PowerPC resources may need to be supported depending >> upon the environment. > > Give me some detail please?!
The most important characteristics that I have observed are: * Dedicated vs. shared resources. Shared resources require information such as the VPHN hcall for CPU assignment to nodes. * memoryless nodes at boot. Nodes need to be defined as 'possible' at boot for operation with other code modules. Previously, the powerpc code would limit the set of possible/online nodes to those which have memory assigned at boot. Subsequent add/remove of CPUs or memory would only work with this subset of possible nodes. * memoryless nodes with CPUs at boot. Due to the previous restriction on nodes, nodes that had CPUs but no memory were being collapsed into other nodes that did have memory at boot. In practice this meant that the node assignment presented by the runtime kernel differed from the affinity and associativity attirbutes presented by the device tree or VPHN hcalls. Nodes that might be known to the pHyp were not 'possible' in the runtime kernel because they did not have memory at boot. > >> This patch fixes some problems encountered at > > What problems? This patch set fixes a couple of problems. * Nodes known to powerpc to be memoryless at boot, but to have CPUs in them are allowed to be 'possible' and 'online'. Memory allocations for those nodes are taken from another node that does have memory until and if memory is hot-added to the node. * Nodes which have no resources assigned at boot, but which may still be referenced subsequently by affinity or associativity attributes, are kept in the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs to the system can reference these nodes and bring them online instead of redirecting the resources to the set of nodes known to have memory at boot. > >> runtime with configurations that support memory-less nodes, but which >> allow CPUs to be added at and after boot. > > How does it fix those problems? This problem was fixed in a couple of ways. First, the code now checks whether the node to which a CPU is mapped by 'numa_update_cpu_topology' / 'arch_update_cpu_topology' has been initialized and has memory available. If either test is false, a call is made to 'try_online_node()' to finish the data structure initialization. Only if we are unable to initialize the node at this point will the CPU node assignment be collapsed into an existing node. After initialization by 'try_online_node()', calls to 'local_memory_node' no longer crash for these memoryless nodes. > >> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >> index b385cd0..e811dd1 100644 >> --- a/arch/powerpc/mm/numa.c >> +++ b/arch/powerpc/mm/numa.c >> @@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu, >> return rc; >> } >> >> +static int verify_node_preparation(int nid) >> +{ > > I would not expect a function called "verify" ... > >> + if ((NODE_DATA(nid) == NULL) || >> + (NODE_DATA(nid)->node_spanned_pages == 0)) { >> + if (try_online_node(nid)) > > .. to do something like online a node. We have changed the function name to 'find_cpu_nid'. > >> + return first_online_node; >> + } >> + >> + return nid; >> +} >> + >> /* >> * Update the CPU maps and sysfs entries for a single CPU when its NUMA >> * characteristics change. This function doesn't perform any locking and is >> @@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked) >> /* Use associativity from first thread for all siblings */ >> vphn_get_associativity(cpu, associativity); >> new_nid = associativity_to_nid(associativity); >> - if (new_nid < 0 || !node_online(new_nid)) >> + if (new_nid < 0 || !node_possible(new_nid)) >> new_nid = first_online_node; >> >> + new_nid = verify_node_preparation(new_nid); > > You're being called part-way through CPU hotplug here, are we sure it's > safe to go and do memory hotplug from there? What's the locking > situation? We are not doing memory hotplug. We are initializing a node that may be used by CPUs or memory before it can be referenced as invalid by a CPU hotplug operation. CPU hotplug operations are protected by a range of APIs including cpu_maps_update_begin/cpu_maps_update_done, cpus_read/write_lock / cpus_read/write_unlock, device locks, and more. Memory hotplug operations, including try_online_node, are protected by mem_hotplug_begin/mem_hotplug_done, device locks, and more. In the case of CPUs being hot-added to a previously memoryless node, the try_online_node operation occurs wholly within the CPU locks with no overlap. Using HMC hot-add/hot-remove operations, I have been able to add and remove CPUs to any possible node without failures. HMC operations involve a degree self-serialization, though. > > cheers > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 m...@linux.vnet.ibm.com