On Wed, 13 Aug 2025, Al Viro wrote: > On Tue, Aug 12, 2025 at 12:25:14PM +1000, NeilBrown wrote: > > Several filesystems use the results of readdir to prime the dcache. > > These filesystems use d_alloc_parallel() which can block if there is a > > concurrent lookup. Blocking in that case is pointless as the lookup > > will add info to the dcache and there is no value in the readdir waiting > > to see if it should add the info too. > > > > Also these calls to d_alloc_parallel() are made while the parent > > directory is locked. A proposed change to locking will lock the parent > > later, after d_alloc_parallel(). This means it won't be safe to wait in > > d_alloc_parallel() while holding the directory lock. > > > > So this patch introduces d_alloc_noblock() which doesn't block > > but instead returns ERR_PTR(-EWOULDBLOCK). Filesystems that prime the > > dcache now use that and ignore -EWOULDBLOCK errors as harmless. > > > > A few filesystems need more than -EWOULDBLOCK - they need to be able to > > create the missing dentry within the readdir. procfs is a good example > > as the inode number is not known until the lookup completes, so readdir > > must perform a full lookup. > > > > For these filesystems d_alloc_locked() is provided. It will return a > > dentry which is already d_in_lookup() but will also lock it against > > concurrent lookup. The filesystem's ->lookup function must co-operate > > by calling lock_lookup() before proceeding with the lookup. This way we > > can ensure exclusion between a lookup performed in ->iterate_shared and > > a lookup performed in ->lookup. Currently this exclusion is provided by > > waiting in d_wait_lookup(). The proposed changed to dir locking will > > mean that calling d_wait_lookup() (in readdir) while already holding > > i_rwsem could deadlock. > > The last one is playing fast and loose with one assertion that is used > in quite a few places in correctness proofs - that the only thing other > threads do to in-lookup dentries is waiting on them (and that - only > in d_wait_lookup()). I can't tell whether it will be a problem without > seeing what you do in the users of that thing, but that creates an > unpleasant areas to watch out for in the future ;-/
Yeah, it's not my favourite part of the series. > > Which filesystems are those, aside of procfs? > afs in afs_lookup_atsys(). While looking up a name that ends "@sys" it need to look up the prefix with various alternate suffixes appended. So this isn't readdir related, but is a lookup-within-a-lookup. The use of d_add_ci() in xfs is the same basic pattern. overlayfs does something in ovl_lookup_real_one() that I don't understand yet but it seems to need a lookup while the directory is locked. ovl_cache_update is in the ovl iterate_shared code (which in fact holds an exclusive lock). I think this is the same pattern as procfs in that an inode number needs to be allocated at lookup time, but there might be more too it. So it is: procfs and overlayfs for lookup in readdir xfs and afs for nested lookup. The only other approach I could come up with was to arrange some sort of proxy-execution. i.e. instead of d_alloc_locked() provide a d_alloc_proxy() which, if it found a d_in_lookup() dentry, would perform the ->lookup itself with some sort of interlock with lookup_slow etc. That would prevent the DCACHE_PAR_LOOKUP dentry leaking out, but would be more intrusive and would affect the lookup path for filesystems which didn't need it. NeilBrown