Hello,
[resending as plain-text, sorry for the noise]
I noticed that the symlinkat() syscall changes behavior when the newpath (i.e.
link name) has a trailing slash and is the path to a directory residing on NFS
depending on this path being in the dentry cache or not. I stumbled upon this
in the context of a Yocto / OE-Core system where I updated coreutils from
version 8.30 to 8.31. This creates problems with ln in coreutils in 8.31. I am
currently using kernel 5.4.90.
What I observe is that sylinkat("name", AT_FDCWD, "/path/to/nfs/existing/dir/")
returns ENOENT when "/path/to/nfs/existing/dir/" is not in the dentry cache but
EEXIST when it is, but only when "/path/to/nfs/existing/dir/" is on NFS (NFSv3
in my case). Note that if I remove the trailing slash from the newpath argument
then it returns EEXIST in all cases.
Following change
https://github.com/coreutils/coreutils/commit/571f63f5010b047a8a3250304053f05949faded4
in coreutils this makes "ln -sf name /path/to/nfs/existing/dir/" sometimes
fail with a "cannot overwrite directory" error (when the path is not in the
dentry cache). There was no problem before this change because ln did a stat of
the link name path before calling symlinkat, so the entry was in the dentry
cache when symlinkat executes.
I have created a simple program to reproduce this more easily, which I have
attached.
To reproduce do this.
- Compile the attached symlinkat.c
- Mount a NFSv3 filesystem at /mnt
- mkdir /mnt/test
- To test the error with no dentry cache and trailing slash:
sync; echo 3 > /proc/sys/vm/drop_caches; ./symlinkat name /mnt/test/
symlinkat name /mnt/test/ failed: No such file or directory (2)
- To test with the dentry cache:
ls -d /mnt/test/; ./symlinkat name /mnt/test/
symlinkat name /mnt/test/ failed: File exists (17)
- To test the error with no dentry cache and no trailing slash:
sync; echo 3 > /proc/sys/vm/drop_caches; ./symlinkat name /mnt/test
symlinkat name /mnt/test failed: File exists (17)
Although I'm no kernel expert, from what I've understood of the kernel code
this seems to be a bad interaction between the generic fs handling in
fs/namei.c and the NFS client implementation. The filename_create() function
will call __lookup_hash() after setting LOOKUP_EXCL in the flags and if there
is no dentry cache for the path then nfs_lookup() will be called, will notice
this flag in the nfs_is_exclusive_create() test, optimize away the lookup and
not fill d_inode in the dentry. When execution returns to filename_create() the
special casing will notice that is_dir is not set and last.name has a trailing
slash and thus returns ENOENT. Looking for LOOKUP_EXCL usage in the kernel only
NFS does this kind of optimization in current kernels, but in 3.5 and older the
same optimization was also done by CIFS.
According to the symlink and symlinkat man pages ENOENT is returned when a
directory component of newpath does not exist or is a dangling symbolic link,
which is not the case here.
What would be the best course of action to address this issue?
Thanks,
Diego
--
Diego Santa Cruz, PhD
Technology Architect
spinetix.com
#include <fcntl.h>
#include <unistd.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
int main(int argc, char *argv[])
{
int r;
if (argc != 3) {
fprintf(stderr, "usage: sylinkat: <oldpath> <newpath>\n");
return EXIT_FAILURE;
}
r = symlinkat(argv[1], AT_FDCWD, argv[2]);
if (r != 0) {
fprintf(stderr, "symlinkat %s %s failed: %s (%i)\n",
argv[1], argv[2], strerror(errno), errno);
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}