On Sat, Oct 29, 2016 at 05:20:17PM +0200, Christoph Hellwig wrote:
> On Sat, Oct 29, 2016 at 01:24:51PM +0100, Al Viro wrote:
> > How about taking this chunk (i.e. telling lockdep that we are not holding 
> > this
> > thing) past the iter_op() call, where file_end_write() used to be?
> 
> We can't as that would not fix the use after free (at least for the lockdep
> case - otherwise the call is a no-op).  Once iter_op returns aio_complete
> might have dropped our reference to the file, and another thread might
> have closed the fd so that the fput from aio_complete was the last one.
> 
> This is something that xfstests/323 can reproduce under the right conditions.

Eww...  IOW, as soon as we'd submitted an async iocb, nobody can so much as
look at struct file *or* iocb, right?  Or underlying inode, or any fs-private
data structures attached to it...

I certainly agree that it's a bug, but IMO you are just papering over it.
Just look at e.g.
        written = mapping->a_ops->direct_IO(iocb, &data);

        /*
         * Finally, try again to invalidate clean pages which might have been
         * cached by non-direct readahead, or faulted in by get_user_pages()
         * if the source of the write was an mmap'ed region of the file
         * we're writing.  Either one is a pretty crazy thing to do,
         * so we don't support it 100%.  If this invalidation
         * fails, tough, the write still worked...
         */
        if (mapping->nrpages) {
                invalidate_inode_pages2_range(mapping,
                                              pos >> PAGE_SHIFT, end);
        }
in generic_file_direct_write() - who said that mapping doesn't point into
freed memory by that point?  Or that
        inode_lock(inode);
        ret = generic_write_checks(iocb, from);
        if (ret > 0)
                ret = __generic_file_write_iter(iocb, from);
        inode_unlock(inode);
in generic_file_write_iter() won't blow up on inode_unlock() of a freed inode?
Or that anything of that sort won't happen in other ->write_iter instances,
for that matter?

NAK, with apologies for not having looked at that earlier.  The bug is real,
all right, but this is not a solution - both incomplete and far too brittle.

Why do we play that kind of insane games, anyway?  Why not e.g. refcount
the (async) iocb and have kiocb_free() drop the reference, with io_submit_one()
holding one across the call of aio_run_iocb()?  Cacheline bouncing issues?
Anything more subtle?

Reply via email to