On Sat, Apr 10, 2021 at 1:04 AM Chris Murphy <[email protected]> wrote:
>
> Hi,
>
> The primary problem is Bolt (Thunderbolt 3) tests that are
> experiencing a regression when run in a container using overlayfs,
> failing at:
>
> Bail out! ERROR:../tests/test-common.c:1413:test_io_dir_is_empty:
> 'empty' should be FALSE
>
> https://gitlab.freedesktop.org/bolt/bolt/-/issues/171#note_872119
>
To summarize, the test case is:
- create empty dir
- open empty dir
- getdents => (".", "..")
- create file at (dirfd, "a",
- lseek to offset 0 on dirfd
- getdents => (".", "..") FAIL to see "a"
It looks like a bug in ovl readdir cache invalidation only there is
not supposed to be any caching of pure upper dir.
Once thing I noticed is that ovl_dentry_version_inc() is inconsistent
with ovl_dir_is_real() - the latter checks whether readdir caching would
be used and the former checks whether invalidating readdir cache is
needed. We need to change ovl_dentry_version_inc() test to:
if (ovl_test_flag(OVL_WHITEOUTS, dir) || impurity)
Or better yet:
if (!ovl_dir_is_real() || impurity)
But this still doesn't explain the reported issue.
The OVL_WHITEOUTS inode flag is set in ovl_get_inode() in several
cases including:
ovl_check_origin_xattr(ofs, upperdentry)
So now we are getting closer to something that sounds related to the
reported issue...
ovl_check_origin_xattr() would return true if
vfs_getxattr(upperdentry, "trusted.overlay.origin", NULL, 0)
would return 0 instead of -ENODATA for some reason even though that
xattr does not exist.
But we happen to be missing a pr_debug() in ovl_do_getxattr(), so
it's hard to say what's going on.
Chris,
As the first step, can you try the suggested fix to ovl_dentry_version_inc()
and/or adding the missing pr_debug() and including those prints in
your report?
> I can reproduce this with 5.12.0-0.rc6.184.fc35.x86_64+debug and at
> approximately the same time I see one, sometimes more, kernel
> messages:
>
> [ 6295.379283] overlayfs: upper fs does not support xattr, falling
> back to index=off and metacopy=off.
>
Can you say why there is no xattr support?
Is the overlayfs mount executed without privileges to create trusted.* xattrs?
The answer to that may be the key to understanding the bug.
> But I don't know if that kernel message relates to the bolt test failure.
>
> If I run the test outside of a container, it doesn't fail. If I run
> the test in a podman container using the btrfs driver instead of the
> overlay driver, it doesn't fail. So it seems like this is an overlayfs
> bug, but could be some kind of overlayfs+btrfs interaction.
>
My guess is it has to do with changes related to mounting overlayfs
inside userns, but I couldn't find any immediate suspects.
Do you have any idea since when the regression appeared?
A bisect would have been helpful here.
> Could this be related and just not yet merged?
> https://lore.kernel.org/linux-unionfs/[email protected]/
>
Not likely.
If you want to be sure do:
echo N > /sys/module/overlay/parameters/xino_auto
Before starting the container.
Above commit only matters for xino_auto = Y.
Thanks,
Amir.