On 2019/10/9 20:25, Robin Murphy wrote: > On 2019-10-08 9:38 am, Yunsheng Lin wrote: >> On 2019/9/25 18:41, Peter Zijlstra wrote: >>> On Wed, Sep 25, 2019 at 05:14:20PM +0800, Yunsheng Lin wrote: >>>> From the discussion above, It seems making the node_to_cpumask_map() >>>> NUMA_NO_NODE aware is the most feasible way to move forwad. >>> >>> That's still wrong. >> >> Hi, Peter >> >> It seems this has trapped in the dead circle. >> >> From my understanding, NUMA_NO_NODE which means not node numa preference >> is the state to describe the node of virtual device or the physical device >> that has equal distance to all cpu. >> >> We can be stricter if the device does have a nearer node, but we can not >> deny that a device does not have a node numa preference or node affinity, >> which also means the control or data buffer can be allocated at the node >> where >> the process is running. >> >> As you has proposed, making it -2 and have dev_to_node() warn if the device >> does >> have a nearer node and not set by the fw is a way to be stricter. >> >> But I think maybe being stricter is not really relevant to NUMA_NO_NODE, >> because >> we does need a state to describe the device that have equal distance to all >> node, >> even if it is not physically scalable. >> >> Any better suggestion to move this forward? > > FWIW (since this is in my inbox), it sounds like the fundamental issue is > that NUMA_NO_NODE is conflated for at least two different purposes, so trying > to sort that out would be a good first step. AFAICS we have genuine "don't > care" cases like alloc_pages_node(), where if the producer says it doesn't > matter then the consumer is free to make its own judgement on what to do, and > fundamentally different "we expect this thing to have an affinity but it > doesn't, so we can't say what's appropriate" cases which could really do with > some separate indicator like "NUMA_INVALID_NODE". > > The tricky part is then bestowed on the producers to decide whether they can > downgrade "invalid" to "don't care". You can technically build 'a device' > whose internal logic is distributed between nodes and thus appears to have > equal affinity - interrupt controllers, for example, may have per-CPU or > per-node interfaces that end up looking like that - so although it's unlikely > it's not outright nonsensical. Similarly a 'device' that's actually emulated > behind a firmware call interface may well effectively have no real affinity.
We may set node of the physical device to NUMA_INVALID_NODE when fw does not provide one. But what do we do about NUMA_INVALID_NODE when alloc_pages_node() is called with nid being NUMA_INVALID_NODE? If we change the node to default one(like node 0) when node of device is NUMA_INVALID_NODE in device_add(), how do we know the default one(like node 0) is the right one to choose? >From the privous disccusion, the below seems not get to consensus yet: 1) Do we need a state like NUMA_NO_NODE to describe that the device does not have any numa preference? 2) What do we do if the fw does not provide a node for the device? Should we guess and pick one for it and how do we do the guessing? Or leave it as it is and handle it as NUMA_NO_NODE? The point of adding another state like NUMA_INVALID_NODE seems to catch the case and give a warning above it when the device does have a nearer node and the fw does not provide one, and alloc_pages_node() still need to handle it as NUMA_NO_NODE? If the above is true, then maybe we can move forward with the above goal. Thanks very much for the suggestion. > > Robin. > > . >