On 10/03/2018 06:27 AM, Michael Bringmann wrote:
> On 10/02/2018 02:45 PM, Tyrel Datwyler wrote:
>> On 10/02/2018 11:13 AM, Michael Bringmann wrote:
>>>
>>>
>>> On 10/02/2018 11:04 AM, Michal Hocko wrote:
>>>> On Tue 02-10-18 10:14:49, Michael Bringmann wrote:
>>>>> On 10/02/2018 09:59 AM, Michal Hocko wrote:
>>>>>> On Tue 02-10-18 09:51:40, Michael Bringmann wrote:
>>>>>> [...]
>>>>>>> When the device-tree affinity attributes have changed for memory,
>>>>>>> the 'nid' affinity calculated points to a different node for the
>>>>>>> memory block than the one used to install it, previously on the
>>>>>>> source system.  The newly calculated 'nid' affinity may not yet
>>>>>>> be initialized on the target system.  The current memory tracking
>>>>>>> mechanisms do not record the node to which a memory block was
>>>>>>> associated when it was added.  Nathan is looking at adding this
>>>>>>> feature to the new implementation of LMBs, but it is not there
>>>>>>> yet, and won't be present in earlier kernels without backporting a
>>>>>>> significant number of changes.
>>>>>>
>>>>>> Then the patch you have proposed here just papers over a real issue, no?
>>>>>> IIUC then you simply do not remove the memory if you lose the race.
>>>>>
>>>>> The problem occurs when removing memory after an affinity change
>>>>> references a node that was previously unreferenced.  Other code
>>>>> in 'kernel/mm/memory_hotplug.c' deals with initializing an empty
>>>>> node when adding memory to a system.  The 'removing memory' case is
>>>>> specific to systems that perform LPM and allow device-tree changes.
>>>>> The powerpc kernel does not have the option of accepting some PRRN
>>>>> requests and accepting others.  It must perform them all.
>>>>
>>>> I am sorry, but you are still too cryptic for me. Either there is a
>>>> correctness issue and the the patch doesn't really fix anything or the
>>>> final race doesn't make any difference and then the ppc code should be
>>>> explicit about that. Checking the node inside the hotplug core code just
>>>> looks as a wrong layer to mitigate an arch specific problem. I am not
>>>> saying the patch is a no-go but if anything we want a big fat comment
>>>> explaining how this is possible because right now it just points to an
>>>> incorrect API usage.
>>>>
>>>> That being said, this sounds pretty much ppc specific problem and I
>>>> would _prefer_ it to be handled there (along with a big fat comment of
>>>> course).
>>>
>>> Let me try again.  Regardless of the path to which we get to this condition,
>>> we currently crash the kernel.  This patch changes that to a WARN_ON notice
>>> and continues executing the kernel without shutting down the system.  I saw
>>> the problem during powerpc testing, because that is the focus of my work.
>>> There are other paths to this function besides powerpc.  I feel that the
>>> kernel should keep running instead of halting.
>>
>> This is still basically a hack to get around a known race. In itself this 
>> patch is still worth while in that we shouldn't crash the kernel on a null 
>> pointer dereference. However, I think the actual problem still needs to be 
>> addressed. We shouldn't run any PRRN events for the source system on the 
>> target after a migration. The device tree update should have taken care of 
>> telling us about new affinities and what not. Can we just throw out any 
>> queued PRRN events when we wake up on the target?
> 
> We are not talking about queued events provided on the source system, but 
> about
> new PRRN events sent by phyp to the kernel on the target system to update the
> kernel state after migration.  No way to predict the content.

Okay, but either way shouldn't your other proposed patches to update memory 
affinity by re-adding memory and changing the time topology updates are stopped 
to include the post-mobility updates put things in the right nodes? Or, am I 
missing something? I would assume a PRRN on the target would assume the target 
was up-to-date with respect to where things are supposed to be located.

-Tyrel

> 
>>
>> -Tyrel
>>>
>>> Regards,
>>>
> 
> Michael
> 

Reply via email to