On Tue, Jan 29, 2019 at 8:27 PM Jens Axboe <[email protected]> wrote:
> Add support for a polled io_uring context. When a read or write is
> submitted to a polled context, the application must poll for completions
> on the CQ ring through io_uring_enter(2). Polled IO may not generate
> IRQ completions, hence they need to be actively found by the application
> itself.
>
> To use polling, io_uring_setup() must be used with the
> IORING_SETUP_IOPOLL flag being set. It is illegal to mix and match
> polled and non-polled IO on an io_uring.
>
> Signed-off-by: Jens Axboe <[email protected]>
[...]
> @@ -102,6 +102,8 @@ struct io_ring_ctx {
>
> struct {
> spinlock_t completion_lock;
> + bool poll_multi_file;
> + struct list_head poll_list;
Please add a comment explaining what protects poll_list against
concurrent modification, and ideally also put lockdep asserts in the
functions that access the list to allow the kernel to sanity-check the
locking at runtime.
As far as I understand:
Elements are added by io_iopoll_req_issued(). io_iopoll_req_issued()
can't race with itself because, depending on IORING_SETUP_SQPOLL,
either you have to come through sys_io_uring_enter() (which takes the
uring_lock), or you have to come from the single-threaded
io_sq_thread().
io_do_iopoll() iterates over the list and removes completed items.
io_do_iopoll() is called through io_iopoll_getevents(), which can be
invoked in two ways during normal operation:
- sys_io_uring_enter -> __io_uring_enter -> io_iopoll_check
->io_iopoll_getevents; this is only protected by the uring_lock
- io_sq_thread -> io_iopoll_check ->io_iopoll_getevents; this doesn't
hold any locks
Additionally, the following exit paths:
- io_sq_thread -> io_iopoll_reap_events -> io_iopoll_getevents
- io_uring_release -> io_ring_ctx_wait_and_kill ->
io_iopoll_reap_events -> io_iopoll_getevents
- io_uring_release -> io_ring_ctx_wait_and_kill -> io_ring_ctx_free
-> io_iopoll_reap_events -> io_iopoll_getevents
So as far as I can tell, you can have various races around access to
the poll_list.
[...]
> +static void io_free_req_many(struct io_ring_ctx *ctx, void **reqs, int *nr)
> +{
> + if (*nr) {
> + kmem_cache_free_bulk(req_cachep, *nr, reqs);
> + io_ring_drop_ctx_refs(ctx, *nr);
> + *nr = 0;
> + }
> +}
[...]
> +static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int
> *nr_events,
> + struct list_head *done)
> +{
> + void *reqs[IO_IOPOLL_BATCH];
> + struct io_kiocb *req;
> + int to_free = 0;
> +
> + while (!list_empty(done)) {
> + req = list_first_entry(done, struct io_kiocb, list);
> + list_del(&req->list);
> +
> + io_cqring_fill_event(ctx, req->user_data, req->error, 0);
> +
> + reqs[to_free++] = req;
> + (*nr_events)++;
> +
> + fput(req->rw.ki_filp);
> + if (to_free == ARRAY_SIZE(reqs))
> + io_free_req_many(ctx, reqs, &to_free);
> + }
> + io_commit_cqring(ctx);
> +
> + if (to_free)
> + io_free_req_many(ctx, reqs, &to_free);
Nit: You check here whether to_free==0, and then io_free_req_many()
does that again. You can delete one of those checks; I'd probably
delete this one.
> +}
[...]