On Thu 19-07-18 19:04:46, Ming Lei wrote:
> On Thu, Jul 19, 2018 at 11:39:18AM +0200, Martin Wilck wrote:
> > bio_iov_iter_get_pages() returns only pages for a single non-empty
> > segment of the input iov_iter's iovec. This may be much less than the number
> > of pages __blkdev_direct_IO_simple() is supposed to process. Call
> > bio_iov_iter_get_pages() repeatedly until either the requested number
> > of bytes is reached, or bio.bi_io_vec is exhausted. If this is not done,
> > short writes or reads may occur for direct synchronous IOs with multiple
> > iovec slots (such as generated by writev()). In that case,
> > __generic_file_write_iter() falls back to buffered writes, which
> > has been observed to cause data corruption in certain workloads.
> > 
> > Note: if segments aren't page-aligned in the input iovec, this patch may
> > result in multiple adjacent slots of the bi_io_vec array to reference the 
> > same
> > page (the byte ranges are guaranteed to be disjunct if the preceding patch 
> > is
> > applied). We haven't seen problems with that in our and the customer's
> > tests. It'd be possible to detect this situation and merge bi_io_vec slots
> > that refer to the same page, but I prefer to keep it simple for now.
> > 
> > Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified 
> > bdev direct-io")
> > Signed-off-by: Martin Wilck <[email protected]>
> > ---
> >  fs/block_dev.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 0dd87aa..41643c4 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -221,7 +221,12 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct 
> > iov_iter *iter,
> >  
> >     ret = bio_iov_iter_get_pages(&bio, iter);
> >     if (unlikely(ret))
> > -           return ret;
> > +           goto out;
> > +
> > +   while (ret == 0 &&
> > +          bio.bi_vcnt < bio.bi_max_vecs && iov_iter_count(iter) > 0)
> > +           ret = bio_iov_iter_get_pages(&bio, iter);
> > +
> >     ret = bio.bi_iter.bi_size;
> >  
> >     if (iov_iter_rw(iter) == READ) {
> > @@ -250,6 +255,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct 
> > iov_iter *iter,
> >             put_page(bvec->bv_page);
> >     }
> >  
> > +out:
> >     if (vecs != inline_vecs)
> >             kfree(vecs);
> >
> 
> You might put the 'vecs' leak fix into another patch, and resue the
> current code block for that.
> 
> Looks all users of bio_iov_iter_get_pages() need this kind of fix, so
> what do you think about the following way?

No. AFAICT all the other users of bio_iov_iter_get_pages() are perfectly
fine with it returning less pages and they loop appropriately.

                                                                Honza
-- 
Jan Kara <[email protected]>
SUSE Labs, CR

Reply via email to