On 2019/10/9 20:25, Robin Murphy wrote:
> On 2019-10-08 9:38 am, Yunsheng Lin wrote:
>> On 2019/9/25 18:41, Peter Zijlstra wrote:
>>> On Wed, Sep 25, 2019 at 05:14:20PM +0800, Yunsheng Lin wrote:
>>>>  From the discussion above, It seems making the node_to_cpumask_map()
>>>> NUMA_NO_NODE aware is the most feasible way to move forwad.
>>>
>>> That's still wrong.
>>
>> Hi, Peter
>>
>> It seems this has trapped in the dead circle.
>>
>>  From my understanding, NUMA_NO_NODE which means not node numa preference
>> is the state to describe the node of virtual device or the physical device
>> that has equal distance to all cpu.
>>
>> We can be stricter if the device does have a nearer node, but we can not
>> deny that a device does not have a node numa preference or node affinity,
>> which also means the control or data buffer can be allocated at the node 
>> where
>> the process is running.
>>
>> As you has proposed, making it -2 and have dev_to_node() warn if the device 
>> does
>> have a nearer node and not set by the fw is a way to be stricter.
>>
>> But I think maybe being stricter is not really relevant to NUMA_NO_NODE, 
>> because
>> we does need a state to describe the device that have equal distance to all 
>> node,
>> even if it is not physically scalable.
>>
>> Any better suggestion to move this forward?
> 
> FWIW (since this is in my inbox), it sounds like the fundamental issue is 
> that NUMA_NO_NODE is conflated for at least two different purposes, so trying 
> to sort that out would be a good first step. AFAICS we have genuine "don't 
> care" cases like alloc_pages_node(), where if the producer says it doesn't 
> matter then the consumer is free to make its own judgement on what to do, and 
> fundamentally different "we expect this thing to have an affinity but it 
> doesn't, so we can't say what's appropriate" cases which could really do with 
> some separate indicator like "NUMA_INVALID_NODE".
> 
> The tricky part is then bestowed on the producers to decide whether they can 
> downgrade "invalid" to "don't care". You can technically build 'a device' 
> whose internal logic is distributed between nodes and thus appears to have 
> equal affinity - interrupt controllers, for example, may have per-CPU or 
> per-node interfaces that end up looking like that - so although it's unlikely 
> it's not outright nonsensical. Similarly a 'device' that's actually emulated 
> behind a firmware call interface may well effectively have no real affinity.

We may set node of the physical device to NUMA_INVALID_NODE when fw does not
provide one.

But what do we do about NUMA_INVALID_NODE when alloc_pages_node() is called
with nid being NUMA_INVALID_NODE?

If we change the node to default one(like node 0) when node of device is
NUMA_INVALID_NODE in device_add(), how do we know the default one(like node 0)
is the right one to choose?

>From the privous disccusion, the below seems not get to consensus yet:
1) Do we need a state like NUMA_NO_NODE to describe that the device does not
   have any numa preference?

2) What do we do if the fw does not provide a node for the device? Should
   we guess and pick one for it and how do we do the guessing? Or leave it
   as it is and handle it as NUMA_NO_NODE?

The point of adding another state like NUMA_INVALID_NODE seems to catch
the case and give a warning above it when the device does have a nearer
node and the fw does not provide one, and alloc_pages_node() still need to
handle it as NUMA_NO_NODE?

If the above is true, then maybe we can move forward with the above goal.

Thanks very much for the suggestion.

> 



> Robin.
> 
> .
> 

Reply via email to