On Sat, Oct 29, 2016 at 12:07:07PM -0700, Linus Torvalds wrote:
> On Sat, Oct 29, 2016 at 11:52 AM, Al Viro <v...@zeniv.linux.org.uk> wrote:
> >
> > And that call can happen as soon as we return from __blockdev_direct_IO()
> > (even earlier, actually).  As soon as that happens, the reference to
> > struct file we'd acquired in io_submit_one() is dropped.  If descriptor
> > table had been shared, another thread might have already closed that sucker,
> > and fput() from aio_complete() would free struct file.
> 
> But that's the point. We don't *do* anything like that any more. We
> now always do the final access from aio_complete(). So it doesn't
> matter if that is called asynchronously (very early) or not.
> 
> That's the whole point of the patch. Exactly to do everything either
> *before* we even submit it (at which point no completion can happen),
> or doing it in aio_complete() which is guaranteed to be after the
> submission. No races, no use-after-free.
> 
> What am I missing?

        if (ret > 0)
                ret = __generic_file_write_iter(iocb, from);
        inode_unlock(inode);
in generic_file_write_iter() (inode might be gone here)
        written = mapping->a_ops->direct_IO(iocb, &data);

        /*
         * Finally, try again to invalidate clean pages which might have been
         * cached by non-direct readahead, or faulted in by get_user_pages()
         * if the source of the write was an mmap'ed region of the file
         * we're writing.  Either one is a pretty crazy thing to do,
         * so we don't support it 100%.  If this invalidation
         * fails, tough, the write still worked...
         */
        if (mapping->nrpages) {
                invalidate_inode_pages2_range(mapping,
                                              pos >> PAGE_SHIFT, end);
in generic_file_direct_write() (mapping points into inode, which might be
freed)
        ret = __blockdev_direct_IO(iocb, inode, target->bt_bdev, &data,
                        xfs_get_blocks_direct, NULL, NULL, 0);
        if (ret >= 0) {
                iocb->ki_pos += ret;
                iov_iter_advance(to, ret);
        }
        xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED);
in xfs_file_dio_aio_read() (ip points to xfs-private part of inode).

And that - just from a quick look.  We *do* access inode between the call
of __blockdev_direct_IO() and return from ->write_iter().  What's more,
as soon as aio_complete() has happened, what's to stop close from another
thread + umount + rmmod unmapping that ->write_iter() completely?

AFAICS, the possibility of dropping the last reference to struct file
before ->write_iter() has returned is fundamentally broken.  I might be
missing something subtle here, but...

Reply via email to