On Fri, Feb 6, 2026 at 2:11 AM NeilBrown <[email protected]> wrote: > > On Thu, 05 Feb 2026, Amir Goldstein wrote: > > On Wed, Feb 4, 2026 at 6:09 AM NeilBrown <[email protected]> wrote: > > > > > > From: NeilBrown <[email protected]> > > > > > > When ovl_create_real() is used to create a file on the upper filesystem > > > it needs to return the resulting dentry - positive and hashed. > > > It is usually the case the that dentry passed to the create function > > > (e.g. vfs_create()) will be suitable but this is not guaranteed. The > > > filesystem may unhash that dentry forcing a repeat lookup next time the > > > name is wanted. > > > > > > So ovl_create_real() must be (and is) aware of this and prepared to > > > perform that lookup to get a hash positive dentry. > > > > > > This is currently done under that same directory lock that provided > > > exclusion for the create. Proposed changes to locking will make this > > > not possible - as the name, rather than the directory, will be locked. > > > The new APIs provided for lookup and locking do not and cannot support > > > this pattern. > > > > > > The lock isn't needed. ovl_create_real() can drop the lock and then get > > > a new lock for the lookup - then check that the lookup returned the > > > correct inode. In a well-behaved configuration where the upper > > > filesystem is not being modified by a third party, this will always work > > > reliably, and if there are separate modification it will fail cleanly. > > > > > > So change ovl_create_real() to drop the lock and call > > > ovl_start_creating_upper() to find the correct dentry. Note that > > > start_creating doesn't fail if the name already exists. > > > > > > This removes the only remaining use of ovl_lookup_upper, so it is > > > removed. > > > > > > Signed-off-by: NeilBrown <[email protected]> > > > --- > > > fs/overlayfs/dir.c | 24 ++++++++++++++++++------ > > > fs/overlayfs/overlayfs.h | 7 ------- > > > 2 files changed, 18 insertions(+), 13 deletions(-) > > > > > > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c > > > index ff3dbd1ca61f..ec08904d084d 100644 > > > --- a/fs/overlayfs/dir.c > > > +++ b/fs/overlayfs/dir.c > > > @@ -219,21 +219,33 @@ struct dentry *ovl_create_real(struct ovl_fs *ofs, > > > struct dentry *parent, > > > err = -EIO; > > > } else if (d_unhashed(newdentry)) { > > > struct dentry *d; > > > + struct name_snapshot name; > > > /* > > > * Some filesystems (i.e. casefolded) may return an > > > unhashed > > > - * negative dentry from the ovl_lookup_upper() call before > > > + * negative dentry from the ovl_start_creating_upper() > > > call before > > > * ovl_create_real(). > > > > > > According to the new locking rules, if the hashed dentry itself is > > the synchronization object, is it going to be allowed to > > filesystem to unhash the dentry while the dentry still in the > > "creating" scope? It is hard for me to wrap my head around this. > > It can be confusing.... > > It will be important for the name the remain locked (and hashed) until > the operation (create, remove, rename) either succeeds or fails. So > leaving a dentry unhashed will be OK providing a subsequent lookup will > also succeed or fail in the same way. The caller must be able to use > the dentry to access the object (i.e. the inode) on success, but they > is nothing in POSIX that requires that the object still has any > particular name. > > > > > Or do we need this here because some filesystems (casefold in > > particular) are not going to support parallel creations? > > There is no reason that a casefolding filesystem would not support parallel > ops. And it isn't just casefolding that acts like this. At least one of > the special filesystems (tracefs maybe) always unhashes on create. You > only ever get a hashed positive dentry as a result of lookup. > (overlayfs would never see this case of course). > > > > > > * In that case, lookup again after making the newdentry > > > * positive, so ovl_create_upper() always returns a hashed > > > * positive dentry. > > > + * As we have to drop the lock before the lookup a race > > > + * could result in a lookup failure. In that case we > > > return > > > + * an error. > > > */ > > > - d = ovl_lookup_upper(ofs, newdentry->d_name.name, parent, > > > - newdentry->d_name.len); > > > - dput(newdentry); > > > - if (IS_ERR_OR_NULL(d)) > > > + take_dentry_name_snapshot(&name, newdentry); > > > + end_creating_keep(newdentry); > > > + d = ovl_start_creating_upper(ofs, parent, &name.name); > > > + release_dentry_name_snapshot(&name); > > > > OK. not saying no to this (yet) but I have to admit that it is pretty > > ugly that the callers of ovl_create_real() want to create a specific > > stable name, which is could be passed in as const char *name > > and yet we end up doing this weird dance here just to keep the name > > from newdentry. > > There are three callers of ovl_create_real() > > ovl_lookup_or_create() does have a "const char *name". > ovl_create_upper() has a stable dentry from which it can copy a QSTR > ovl_create_temp() would need some sort of dance to keep hold of the > temporary name that was allocated. > > If it weren't for ovl_create_temp() I would agree with you. > > Though we could have the three callers of ovl_start_creating_temp() pass a > "char name[OVL_TEMPNAME_SIZE]" in, then ovl_create_temp() would have > easy access. > I could do that if you like.
Yes, considering that two of the callers are from the same function (ovl_whiteout()) I think that would end up looking nicer. Thanks, Amir.
