On Thu, 2018-07-12 at 09:08 -0600, Jens Axboe wrote:
> On 7/12/18 8:36 AM, Hannes Reinecke wrote:
> > Hi Jens, Christoph,
> >
> > we're currently hunting down a silent data corruption occurring due
> > to
> > commit 72ecad22d9f1 ("block: support a full bio worth of IO for
> > simplified bdev direct-io").
> >
> > While the whole thing is still hazy on the details, the one thing
> > we've
> > found is that reverting that patch fixes the data corruption.
> >
> > And looking closer, I've found this:
> >
> > static ssize_t
> > blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
> > {
> > int nr_pages;
> >
> > nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES + 1);
> > if (!nr_pages)
> > return 0;
> > if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_PAGES)
> > return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
> >
> > return __blkdev_direct_IO(iocb, iter, min(nr_pages,
> > BIO_MAX_PAGES));
> > }
> >
> > When checking the call path
> > __blkdev_direct_IO()->bio_alloc_bioset()->bvec_alloc()
> > I found that bvec_alloc() will fail if nr_pages > BIO_MAX_PAGES.
> >
> > So why is there the check for 'nr_pages <= BIO_MAX_PAGES' ?
> > It's not that we can handle it in __blkdev_direct_IO() ...
>
> The logic could be cleaned up like below, the sync part is really all
> we care about. What is the test case for this? async or sync?
It's sync, and the corruption is in the __blkdev_direct_IO_simple()
path. It starts to occur with your "block: support a full bio worth of
IO for simplified bdev direct-io" patch, which causes the "simple" path
to be taken for larger IO sizes.
Martin
>
> I also don't remember why it's BIO_MAX_PAGES + 1...
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 0dd87aaeb39a..14ef3d71b55f 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -424,13 +424,13 @@ blkdev_direct_IO(struct kiocb *iocb, struct
> iov_iter *iter)
> {
> int nr_pages;
>
> - nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES + 1);
> + nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES);
> if (!nr_pages)
> return 0;
> - if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_PAGES)
> + if (is_sync_kiocb(iocb))
> return __blkdev_direct_IO_simple(iocb, iter,
> nr_pages);
>
> - return __blkdev_direct_IO(iocb, iter, min(nr_pages,
> BIO_MAX_PAGES));
> + return __blkdev_direct_IO(iocb, iter, nr_pages);
> }
>
> static __init int blkdev_init(void)
>
--
Dr. Martin Wilck <[email protected]>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)