On Fri, Jan 30, 2026 at 02:31:54PM -0800, Samuel Wu wrote:
> On Thu, Jan 29, 2026 at 11:02 PM Al Viro <[email protected]> wrote:
> > OK.  Could you take a clone of mainline repository and in there run
> > ; git fetch git://git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git 
> > for-wsamuel:for-wsamuel
> > then
> > ; git diff for-wsamuel e5bf5ee26663
> > to verify that for-wsamuel is identical to tree you've seen breakage on
> > ; git diff for-wsamuel-base 1544775687f0
> > to verify that for-wsamuel-base is the tree where the breakage did not 
> > reproduce
> > Then bisect from for-wsamuel-base to for-wsamuel.
> >
> > Basically, that's the offending commit split into steps; let's try to figure
> > out what causes the breakage with better resolution...
> 
> Confirming that bisect points to this patch: 09e88dc22ea2 (serialize
> ffs_ep0_open() on ffs->mutex)

So we have something that does O_NDELAY opens of ep0 *and* does not retry on
EAGAIN?

How lovely...  Could you slap
        WARN_ON(ret == -EAGAIN);
right before that
        if (ret < 0)
                return ret;
in there and see which process is doing that?  Regression is a regression, 
odd userland or not, but I would like to see what is that userland actually
trying to do there.

*grumble*

IMO at that point we have two problems - one is how to avoid a revert of the
tail of tree-in-dcache series, another is how to deal with quite real
preexisting bugs in functionfs.

Another thing to try (not as a suggestion of a fix, just an attempt to figure
out how badly would the things break): in current mainline replace that
        ffs_mutex_lock(&ffs->mutex, file->f_flags & O_NONBLOCK)
in ffs_ep0_open() with
        ffs_mutex_lock(&ffs->mutex, false)
and see how badly do the things regress for userland.  Again, I'm not saying
that this is a fix - just trying to get some sense of what's the userland
is doing.

FWIW, it might make sense to try a lighter serialization in ffs_ep0_open() -
taking it there is due to the following scenario (assuming 6.18 or earlier):
ffs->state is FFS_DEACTIVATED.  ffs->opened is 0.  Two threads attempt to
open ep0.  Here's what happens prior to these patches:

static int ffs_ep0_open(struct inode *inode, struct file *file)
{
        struct ffs_data *ffs = inode->i_private;
 
        if (ffs->state == FFS_CLOSING)
                return -EBUSY;
 
        file->private_data = ffs;
        ffs_data_opened(ffs);

with
static void ffs_data_opened(struct ffs_data *ffs)
{
        refcount_inc(&ffs->ref);
        if (atomic_add_return(1, &ffs->opened) == 1 &&
                        ffs->state == FFS_DEACTIVATED) {
                ffs->state = FFS_CLOSING;
                ffs_data_reset(ffs);
        }
}

IOW, the sequence is
        if (state == FFS_CLOSING)
                return -EBUSY;
        n = atomic_add_return(1, &opened);
        if (n == 1 && state == FFS_DEACTIVATED) {
                state = FFS_CLOSING;
                ffs_data_reset();

See the race there?  If the second open() comes between the
increment of ffs->opened and setting the state to FFS_CLOSING,
it will *not* fail with EBUSY - it will proceed to return to
userland, while the first sucker is crawling through the work
in ffs_data_reset()/ffs_data_clear()/ffs_epfiles_destroy().

What's more, there's nothing to stop that second opener from
calling write() on the descriptor it got.  No exclusion there -
        ffs->state = FFS_READ_DESCRIPTORS;
        ffs->setup_state = FFS_NO_SETUP;
        ffs->flags = 0;
in ffs_data_reset() is *not* serialized against ffs_ep0_write().
Get preempted right after setting ->state and that write()
will go just fine, only to be surprised when the first thread
regains CPU and continues modifying the contents of *ffs
under whatever the second thread is doing.

That code obviously relies upon that kind of shit being prevented
by that -EBUSY logics in ep0 open() and that logics is obviously
racy as it is.  Note that other callers of ffs_data_reset() have
similar problem: ffs_func_set_alt(), for example has
        if (ffs->state == FFS_DEACTIVATED) {
                ffs->state = FFS_CLOSING;
                INIT_WORK(&ffs->reset_work, ffs_reset_work);
                schedule_work(&ffs->reset_work);
                return -ENODEV;
        }
again, with no exclusion.  Lose CPU just after seeing FFS_DEACTIVATED,
then have another thread open() the sucker and start going through
ffs_data_reset(), only to have us regain CPU and schedule this for
execution:
static void ffs_reset_work(struct work_struct *work)
{
        struct ffs_data *ffs = container_of(work,
                struct ffs_data, reset_work);
        ffs_data_reset(ffs);
}
IOW, stray ffs_data_reset() coming to surprise the opener who'd
just finished ffs_data_reset() during open(2) and proceeded to
write to the damn thing, etc.

That's obviously on the "how do we fix the preexisting bugs" side
of things, though - regression needs to be dealt with ASAP anyway.

Reply via email to