I applied your suggestion with a couple of modifications, and it looks to have worked for the first 2 migration events. I am not seeing the errors from repeated migrations. The revised patch tested is,
diff --git a/drivers/of/base.c b/drivers/of/base.c index 466e3c8..8bf64e5 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -1096,8 +1096,14 @@ struct device_node *of_find_node_by_phandle(phandle handle) if (phandle_cache) { if (phandle_cache[masked_handle] && - handle == phandle_cache[masked_handle]->phandle) + handle == phandle_cache[masked_handle]->phandle) { np = phandle_cache[masked_handle]; + + if (of_node_check_flag(np, OF_DETACHED)) { + np = NULL; + phandle_cache[masked_handle] = NULL; + } + } } if (!np) { During a conference call this morning, Tyrel expressed concerns about the use of the phandle_cache on powerpc, at all. I will try another build with that feature disabled, but without this patch. Michael On 07/31/2018 01:34 AM, Michael Ellerman wrote: > Hi Rob/Frank, > > I think we might have a problem with the phandle_cache not interacting > well with of_detach_node(): > > Michael Bringmann <m...@linux.vnet.ibm.com> writes: >> See below. >> >> On 07/30/2018 01:31 AM, Michael Ellerman wrote: >>> Michael Bringmann <m...@linux.vnet.ibm.com> writes: >>> >>>> During LPAR migration, the content of the device tree/sysfs may >>>> be updated including deletion and replacement of nodes in the >>>> tree. When nodes are added to the internal node structures, they >>>> are appended in FIFO order to a list of nodes maintained by the >>>> OF code APIs. >>> >>> That hasn't been true for several years. The data structure is an n-ary >>> tree. What kernel version are you working on? >> >> Sorry for an error in my description. I oversimplified based on the >> name of a search iterator. Let me try to provide a better explanation >> of the problem, here. >> >> This is the problem. The PPC mobility code receives RTAS requests to >> delete nodes with platform-/hardware-specific attributes when restarting >> the kernel after a migration. My example is for migration between a >> P8 Alpine and a P8 Brazos. Nodes to be deleted may include 'ibm,random-v1', >> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', >> or others. >> >> The mobility.c code calls 'of_detach_node' for the nodes and their children. >> This makes calls to detach the properties and to try to remove the associated >> sysfs/kernfs files. >> >> Then new copies of the same nodes are next provided by the PHYP, local >> copies are built, and a pointer to the 'struct device_node' is passed to >> of_attach_node. Before the call to of_attach_node, the phandle is >> initialized >> to 0 when the data structure is alloced. During the call to of_attach_node, >> it calls __of_attach_node which pulls the actual name and phandle from just >> created sub-properties named something like 'name' and 'ibm,phandle'. >> >> This is all fine for the first migration. The problem occurs with the >> second and subsequent migrations when the PHYP on the new system wants to >> replace the same set of nodes again, referenced with the same names and >> phandle values. >> >>> >>>> When nodes are removed from the device tree, they >>>> are marked OF_DETACHED, but not actually deleted from the system >>>> to allow for pointers cached elsewhere in the kernel. The order >>>> and content of the entries in the list of nodes is not altered, >>>> though. >>> >>> Something is going wrong if this is actually happening. >>> >>> When the node is detached it should be *detached* from the tree of all >>> nodes, so it should not be discoverable other than by having an existing >>> pointer to it. >> On the second and subsequent migrations, the PHYP tells the system >> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1', >> 'ibm,compression-v1', 'ibm,sym-encryption-v1'. It specifies these >> nodes by its known set of phandle values -- the same handles used >> by the PHYP on the source system are known on the target system. >> The mobility.c code calls of_find_node_by_phandle() with these values >> and ends up locating the first instance of each node that was added >> during the original boot, instead of the second instance of each node >> created after the first migration. The detach during the second >> migration fails with errors like, >> >> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 >> __of_detach_node+0x8/0xa0 >> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag >> inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc >> xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod >> ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod >> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: G W >> 4.18.0-rc1-wi107836-v05-120+ #201 >> [ 4565.030737] NIP: c0000000007c1ea8 LR: c0000000007c1fb4 CTR: >> 0000000000655170 >> [ 4565.030741] REGS: c0000003f302b690 TRAP: 0700 Tainted: G W >> (4.18.0-rc1-wi107836-v05-120+) >> [ 4565.030745] MSR: 800000010282b033 >> <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22288822 XER: 0000000a >> [ 4565.030757] CFAR: c0000000007c1fb0 IRQMASK: 1 >> [ 4565.030757] GPR00: c0000000007c1fa4 c0000003f302b910 c00000000114bf00 >> c0000003ffff8e68 >> [ 4565.030757] GPR04: 0000000000000001 ffffffffffffffff 800000c008e0b4b8 >> ffffffffffffffff >> [ 4565.030757] GPR08: 0000000000000000 0000000000000001 0000000080000003 >> 0000000000002843 >> [ 4565.030757] GPR12: 0000000000008800 c00000001ec9ae00 0000000040000000 >> 0000000000000000 >> [ 4565.030757] GPR16: 0000000000000000 0000000000000008 0000000000000000 >> 00000000f6ffffff >> [ 4565.030757] GPR20: 0000000000000007 0000000000000000 c0000003e9f1f034 >> 0000000000000001 >> [ 4565.030757] GPR24: 0000000000000000 0000000000000000 0000000000000000 >> 0000000000000000 >> [ 4565.030757] GPR28: c000000001549d28 c000000001134828 c0000003ffff8e68 >> c0000003f302b930 >> [ 4565.030804] NIP [c0000000007c1ea8] __of_detach_node+0x8/0xa0 >> [ 4565.030808] LR [c0000000007c1fb4] of_detach_node+0x74/0xd0 >> [ 4565.030811] Call Trace: >> [ 4565.030815] [c0000003f302b910] [c0000000007c1fa4] >> of_detach_node+0x64/0xd0 (unreliable) >> [ 4565.030821] [c0000003f302b980] [c0000000000c33c4] >> dlpar_detach_node+0xb4/0x150 >> [ 4565.030826] [c0000003f302ba10] [c0000000000c3ffc] delete_dt_node+0x3c/0x80 >> [ 4565.030831] [c0000003f302ba40] [c0000000000c4380] >> pseries_devicetree_update+0x150/0x4f0 >> [ 4565.030836] [c0000003f302bb70] [c0000000000c479c] >> post_mobility_fixup+0x7c/0xf0 >> [ 4565.030841] [c0000003f302bbe0] [c0000000000c4908] >> migration_store+0xf8/0x130 >> [ 4565.030847] [c0000003f302bc70] [c000000000998160] >> kobj_attr_store+0x30/0x60 >> [ 4565.030852] [c0000003f302bc90] [c000000000412f14] sysfs_kf_write+0x64/0xa0 >> [ 4565.030857] [c0000003f302bcb0] [c000000000411cac] >> kernfs_fop_write+0x16c/0x240 >> [ 4565.030862] [c0000003f302bd00] [c000000000355f20] __vfs_write+0x40/0x220 >> [ 4565.030867] [c0000003f302bd90] [c000000000356358] vfs_write+0xc8/0x240 >> [ 4565.030872] [c0000003f302bde0] [c0000000003566cc] ksys_write+0x5c/0x100 >> [ 4565.030880] [c0000003f302be30] [c00000000000b288] system_call+0x5c/0x70 >> [ 4565.030884] Instruction dump: >> [ 4565.030887] 38210070 38600000 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 >> ebc1fff0 ebe1fff8 >> [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b090000> 2f890000 >> 4cde0020 e9030040 >> [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]--- >> >> The mobility.c code continues on during the second migration, accepts the >> definitions of the new nodes from the PHYP and ends up renaming the new >> properties e.g. >> >> [ 4565.827296] Duplicate name in base, renamed to "ibm,platform-facilities#1" >> >> I don't see any check like 'of_node_check_flag(np, OF_DETACHED)' within >> of_find_node_by_phandle to skip nodes that are detached, but still present >> due to caching or use count considerations. Another possibility to consider >> is that of_find_node_by_phandle also uses something called 'phandle_cache' >> which may have outdated data as of_detach_node() does not have access to >> that cache for the 'OF_DETACHED' nodes. > > Yes the phandle_cache looks like it might be the problem. > > I saw of_free_phandle_cache() being called as late_initcall, but didn't > realise that's only if MODULES is disabled. > > So I don't see anything that invalidates the phandle_cache when a node > is removed. > > The right solution would be for __of_detach_node() to invalidate the > phandle_cache for the node being detached. That's slightly complicated > by the phandle_cache being static inside base.c > > To test the theory that it's the phandle_cache causing the problems can > you try this patch: > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index 848f549164cd..60e219132e24 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -1098,6 +1098,9 @@ struct device_node *of_find_node_by_phandle(phandle > handle) > if (phandle_cache[masked_handle] && > handle == phandle_cache[masked_handle]->phandle) > np = phandle_cache[masked_handle]; > + > + if (of_node_check_flag(np, OF_DETACHED)) > + np = NULL; > } > > if (!np) { > > cheers > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 m...@linux.vnet.ibm.com