On Tue, 2018-10-02 at 15:35:59 UTC, Nathan Fontenot wrote: > When removing memory we need to remove the memory from the node > it was added to instead of looking up the node it should be in > in the device tree. > > During testing we have seen scenarios where the affinity for a > LMB changes due to a partition migration or PRRN event. In these > cases the node the LMB exists in may not match the node the device > tree indicates it belongs in. This can lead to a system crash > when trying to DLPAR remove the LMB after a migration or PRRN > event. The current code looks up the node in the device tree to > remove the LMB from, the crash occurs when we try to offline this > node and it does not have any data, i.e. node_data[nid] == NULL. > > 36:mon> e > cpu 0x36: Vector: 300 (Data Access) at [c0000001828b7810] > pc: c00000000036d08c: try_offline_node+0x2c/0x1b0 > lr: c0000000003a14ec: remove_memory+0xbc/0x110 > sp: c0000001828b7a90 > msr: 800000000280b033 > dar: 9a28 > dsisr: 40000000 > current = 0xc0000006329c4c80 > paca = 0xc000000007a55200 softe: 0 irq_happened: 0x01 > pid = 76926, comm = kworker/u320:3 > > 36:mon> t > [link register ] c0000000003a14ec remove_memory+0xbc/0x110 > [c0000001828b7a90] c00000000006a1cc arch_remove_memory+0x9c/0xd0 (unreliable) > [c0000001828b7ad0] c0000000003a14e0 remove_memory+0xb0/0x110 > [c0000001828b7b20] c0000000000c7db4 dlpar_remove_lmb+0x94/0x160 > [c0000001828b7b60] c0000000000c8ef8 dlpar_memory+0x7e8/0xd10 > [c0000001828b7bf0] c0000000000bf828 handle_dlpar_errorlog+0xf8/0x160 > [c0000001828b7c60] c0000000000bf8cc pseries_hp_work_fn+0x3c/0xa0 > [c0000001828b7c90] c000000000128cd8 process_one_work+0x298/0x5a0 > [c0000001828b7d20] c000000000129068 worker_thread+0x88/0x620 > [c0000001828b7dc0] c00000000013223c kthread+0x1ac/0x1c0 > [c0000001828b7e30] c00000000000b45c ret_from_kernel_thread+0x5c/0x80 > > To resolve this we need to track the node a LMB belongs to when > it is added to the system so we can remove it from that node instead > of the node that the device tree indicates it should belong to. > > Signed-off-by: Nathan Fontenot <[email protected]>
Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/b2d3b5ee66f2a04a918cc043cec0c9ed cheers
