Hi Jens
On 11/10/18 11:13 PM, Jens Axboe wrote:
> We only really need the barrier if we're going to be sleeping,
> if we're just polling we're fine with the __set_current_state()
> variant.
>
> Signed-off-by: Jens Axboe <[email protected]>
> ---
> fs/block_dev.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index c039abfb2052..e7550b1f9670 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -237,12 +237,23 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct
> iov_iter *iter,
>
> qc = submit_bio(&bio);
> for (;;) {
> - set_current_state(TASK_UNINTERRUPTIBLE);
> + /*
> + * Using non-atomic task state for polling
> + */
> + __set_current_state(TASK_UNINTERRUPTIBLE);
> +
> if (!READ_ONCE(bio.bi_private))
> break;
When we don't use polling, the blkdev_bio_end_io_simple may come on a different
cpu before
submit_bio returns
For example,
__blkdev_direct_IO_simple
qc = submit_bio(&bio)
blkdev_bio_end_io_simple
WRITE_ONCE(bio.bi_private, NULL)
wake_up_process(waiter)
__set_current_state(TASK_UNINTERRUPTIBLE)
READ_ONCE(bio.bi_private)
At the moment, a smp_rmb at least maybe needed before READ_ONCE(bio.bi_private)
to ensure the WRITE_ONCE(bio.bi_private, NULL) is seen if we miss the wakeup.
> +
> if (!(iocb->ki_flags & IOCB_HIPRI) ||
> - !blk_poll(bdev_get_queue(bdev), qc))
> + !blk_poll(bdev_get_queue(bdev), qc)) {
> + /*
> + * If we're scheduling, ensure that task state
> + * change is ordered and seen.
> + */
> + smp_mb();
> io_schedule();
> + }
> }
> __set_current_state(TASK_RUNNING);
>
> @@ -403,13 +414,19 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter
> *iter, int nr_pages)
> return -EIOCBQUEUED;
>
> for (;;) {
> - set_current_state(TASK_UNINTERRUPTIBLE);
> + /*
> + * See __blkdev_direct_IO_simple() loop
> + */
> + __set_current_state(TASK_UNINTERRUPTIBLE);
> +
Similar with above.
> if (!READ_ONCE(dio->waiter))
> break;
>
> if (!(iocb->ki_flags & IOCB_HIPRI) ||
> - !blk_poll(bdev_get_queue(bdev), qc))
> + !blk_poll(bdev_get_queue(bdev), qc)) {
> + smp_mb();
> io_schedule();
> + }
> }
> __set_current_state(TASK_RUNNING);
>
>
Thanks
Jianchao