> > node 0 (because firmware doesn't provide the distance information for > > memoryless/cpuless nodes): > > > > node 0 1 2 3 > > 0: 10 40 10 10 > > 1: 40 10 40 40 > > 2: 10 40 10 10 > > 3: 10 40 10 10 > > *groan*... what does it do for things like percpu memory? ISTR the > per-cpu chunks are all allocated early too. Having them all use memory > out of node-0 would seem sub-optimal.
In the specific failing case, there is only one node with memory; all other nodes are cpu only nodes. However in the generic case since its just a cpu hotplug ops, the memory allocated for per-cpu chunks allocated early would remain. May be Michael Ellerman can correct me here. > > > We should have: > > > > node 0 1 2 3 > > 0: 10 40 40 40 > > 1: 40 10 40 40 > > 2: 40 40 10 40 > > 3: 40 40 40 10 > > Can it happen that it introduces a new distance in the table? One that > hasn't been seen before? This example only has 10 and 40, but suppose > the new node lands at distance 20 (or 80); can such a thing happen? > > If not; why not? Yes distances can be 20, 40 or 80. There is nothing that makes the node distance to be 40 always. > So you're relying on sched_domain_numa_masks_set/clear() to fix this up, > but that in turn relies on the sched_domain_numa_levels thing to stay > accurate. > > This all seems very fragile and unfortunate. > Any reasons why this is fragile? -- Thanks and Regards Srikar Dronamraju