On 11/18/18 6:57 PM, Dave Chinner wrote:
> On Sat, Nov 17, 2018 at 04:53:17PM -0700, Jens Axboe wrote:
>> Needs further work, but this should work fine on normal setups
>> with a file system on a pollable block device.
> 
> What should work fine? I've got no idea what this patch is actually
> providing....

Sorry, should have made it more clear that this one is just a test
patch to enable polled aio for files, it's not supposed to be
considered complete.

>> Signed-off-by: Jens Axboe <[email protected]>
>> ---
>>  fs/aio.c       | 2 ++
>>  fs/direct-io.c | 4 +++-
>>  fs/iomap.c     | 7 +++++--
>>  3 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/aio.c b/fs/aio.c
>> index 500da3ffc376..e02085fe10d7 100644
>> --- a/fs/aio.c
>> +++ b/fs/aio.c
>> @@ -1310,6 +1310,8 @@ static struct block_device *aio_bdev_host(struct kiocb 
>> *req)
>>  
>>      if (S_ISBLK(inode->i_mode))
>>              return I_BDEV(inode);
>> +    else if (inode->i_sb && inode->i_sb->s_bdev)
>> +            return inode->i_sb->s_bdev;
> 
> XFS might be doing AIO to files on real-time device, not
> inode->i_sb->s_bdev. So this may well be the wrong block device
> for the IO being submitted.

See patch description, XFS doing AIO on a real-time device is not a
"normal setup".

It doesn't work for btrfs either, for instance. As far as I can tell, there
are a few routes that we can go here:

1) Have an fs ops that returns the bdev.
2) Set it after submit time, like we do in other spots. This won't work
   if we straddle devices.

I don't feel that strongly about it, and frankly I'm fine with just
dropping non-bdev support for now. The patch is provided for testing
for the 99% of use cases, which is a file system on top of a single
device.

>> diff --git a/fs/iomap.c b/fs/iomap.c
>> index 74c1f37f0fd6..4cf412b6230a 100644
>> --- a/fs/iomap.c
>> +++ b/fs/iomap.c
>> @@ -1555,6 +1555,7 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap 
>> *iomap, loff_t pos,
>>      struct page *page = ZERO_PAGE(0);
>>      int flags = REQ_SYNC | REQ_IDLE;
>>      struct bio *bio;
>> +    blk_qc_t qc;
>>  
>>      bio = bio_alloc(GFP_KERNEL, 1);
>>      bio_set_dev(bio, iomap->bdev);
>> @@ -1570,7 +1571,9 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap 
>> *iomap, loff_t pos,
>>      bio_set_op_attrs(bio, REQ_OP_WRITE, flags);
>>  
>>      atomic_inc(&dio->ref);
>> -    return submit_bio(bio);
>> +    qc = submit_bio(bio);
>> +    WRITE_ONCE(dio->iocb->ki_blk_qc, qc);
>> +    return qc;
>>  }
> 
> Why is this added to sub-block zeroing IO calls? It gets overwritten
> by the data IO submission, so this value is going to change as the
> IO progresses. What does making these partial IOs visible provide,
> especially as they then get overwritten by the next submissions?
> Indeed, how does one wait on all IOs in the DIO to complete if we
> are only tracking one of many?

That does look like a mistake, the zeroing should just be ignored.
We only care about the actual IO.

-- 
Jens Axboe

Reply via email to