On Fri, Mar 8, 2019 at 2:55 PM Linus Torvalds
<[email protected]> wrote:
>
> I'm going to run the usual build tests, but also look at the basic
> sanity tests and boot and run them just to be careful before actually
> doing that final "ok pushed out".
While waiting for that, I'm looking at the file pointer refs, because
that was completely buggered in fs/aio.c.
What protects somebody from:
- io_uring_register(IORING_REGISTER_FILES);
- start async IO
- io_uring_register(IORING_UNREGISTER_FILES);
and now it had better synchronize everything.
It looks like it migth work due to
(a) the mutex_lock(&ctx->uring_lock) around registration
(b) the wait_for_completion(&ctx->ctx_done) in __io_uring_register
presumably waits for each outstanding request.
HOWEVER.
The io_kiocb reference counting seems to be the *exact* same bogus
reference counting that fs/aio.c had, with the magical "zero means it
was never initialized and counts as one" handling, which was buggy in
fs/aio.c too, and caused serious problems with races between request
creation (the "synchronous" part) and requests actually being finished
asynchronously (the "completion" part).
IOW, I can already tell that the reference counting looks suspicious with
static void io_free_req(struct io_kiocb *req)
{
if (!refcount_read(&req->refs) || refcount_dec_and_test(&req->refs)) {
where that whole first "oh, a zero refcount is magic" handling looks
*very* suspicious. It basically says "some ops don't do references
properly at all".
This is *exactly* the same bogosity that fds/aio.c had, and it has
*exactly* the same pattern, with the "poll" code doing
/* one for removal from waitqueue, one for this function */
refcount_set(&req->refs, 2);
and then everybody else seems to initialize req->refs to zero at
allocation time, because they magically have no lifetime rules for the
req.
It was bogus garbage in fs/aio.c, and honestly, looking at how much of
the logic looks suspiciously very similar jhere, I suspect it's bogus
garbage here too.
In fact, all the setup code looks suspiciously similar in general. It
has the same broken "prep_rw" and "complete_rw" thing which was racy
with files opened with O_DIRECT "competing" their IO while still
holding i_rwsem on the inode, and then possibly things getting
dropped.
I'm getting the very strong signal that this code had all the logic
taken from fs/aio.c, bugs and all.
Al has patches to fix the aio.c cases. I very much suspect some very
similar patches are needed in fs/io_ring.c.
Linus