On Thu 19-07-18 19:04:46, Ming Lei wrote:
> On Thu, Jul 19, 2018 at 11:39:18AM +0200, Martin Wilck wrote:
> > bio_iov_iter_get_pages() returns only pages for a single non-empty
> > segment of the input iov_iter's iovec. This may be much less than the number
> > of pages __blkdev_direct_IO_simple() is supposed to process. Call
> > bio_iov_iter_get_pages() repeatedly until either the requested number
> > of bytes is reached, or bio.bi_io_vec is exhausted. If this is not done,
> > short writes or reads may occur for direct synchronous IOs with multiple
> > iovec slots (such as generated by writev()). In that case,
> > __generic_file_write_iter() falls back to buffered writes, which
> > has been observed to cause data corruption in certain workloads.
> >
> > Note: if segments aren't page-aligned in the input iovec, this patch may
> > result in multiple adjacent slots of the bi_io_vec array to reference the
> > same
> > page (the byte ranges are guaranteed to be disjunct if the preceding patch
> > is
> > applied). We haven't seen problems with that in our and the customer's
> > tests. It'd be possible to detect this situation and merge bi_io_vec slots
> > that refer to the same page, but I prefer to keep it simple for now.
> >
> > Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified
> > bdev direct-io")
> > Signed-off-by: Martin Wilck <[email protected]>
> > ---
> > fs/block_dev.c | 8 +++++++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 0dd87aa..41643c4 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -221,7 +221,12 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct
> > iov_iter *iter,
> >
> > ret = bio_iov_iter_get_pages(&bio, iter);
> > if (unlikely(ret))
> > - return ret;
> > + goto out;
> > +
> > + while (ret == 0 &&
> > + bio.bi_vcnt < bio.bi_max_vecs && iov_iter_count(iter) > 0)
> > + ret = bio_iov_iter_get_pages(&bio, iter);
> > +
> > ret = bio.bi_iter.bi_size;
> >
> > if (iov_iter_rw(iter) == READ) {
> > @@ -250,6 +255,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct
> > iov_iter *iter,
> > put_page(bvec->bv_page);
> > }
> >
> > +out:
> > if (vecs != inline_vecs)
> > kfree(vecs);
> >
>
> You might put the 'vecs' leak fix into another patch, and resue the
> current code block for that.
>
> Looks all users of bio_iov_iter_get_pages() need this kind of fix, so
> what do you think about the following way?
No. AFAICT all the other users of bio_iov_iter_get_pages() are perfectly
fine with it returning less pages and they loop appropriately.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR