On Thu, Jun 20, 2024 at 02:54:09PM +0100, Matthew Wilcox wrote:
> On Thu, Jun 20, 2024 at 09:36:42AM -0400, Kent Overstreet wrote:
> > On Thu, Jun 20, 2024 at 09:21:57PM +0800, Hongbo Li wrote:
> > > Support fallback to buffered I/O if the operation being performed on
> > > unaligned length or offset. This may change the behavior for direct
> > > I/O in some cases.
> > > 
> > > [Before]
> > > For length which aligned with 256 bytes (not SECTOR aligned) will
> > > read failed under direct I/O.
> > > 
> > > [After]
> > > For length which aligned with 256 bytes (not SECTOR aligned) will
> > > read the data successfully under direct I/O because it will fallback
> > > to buffer I/O.
> 
> This is against the O_DIRECT requirements.
> 
>    O_DIRECT
>        The O_DIRECT flag may impose alignment restrictions on  the  length  
> and
>        address  of  user-space  buffers  and the file offset of I/Os.  In 
> Linux
>        alignment restrictions vary by filesystem and kernel version  and  
> might
>        be  absent  entirely.   The  handling  of  misaligned O_DIRECT I/Os 
> also
>        varies; they can either fail with EINVAL or fall back to buffered I/O.
> 
>        Since Linux 6.1, O_DIRECT support and alignment restrictions for a  
> file
>        can  be  queried using statx(2), using the STATX_DIOALIGN flag.  
> Support
>        for STATX_DIOALIGN varies by filesystem; see statx(2).
> 
>        Some filesystems provide their  own  interfaces  for  querying  
> O_DIRECT
>        alignment restrictions, for example the XFS_IOC_DIOINFO operation in 
> xf‐
>        sctl(3).  STATX_DIOALIGN should be used instead when it is available.
> 
>        If none of the above is available, then direct I/O support and 
> alignment
>        restrictions  can  only  be  assumed  from  known characteristics of 
> the
>        filesystem, the individual file, the underlying storage  device(s),  
> and
>        the  kernel  version.  In Linux 2.4, most filesystems based on block 
> de‐
>        vices require that the file offset and the length and memory address  
> of
>        all  I/O  segments  be multiples of the filesystem block size 
> (typically
>        4096 bytes).  In Linux 2.6.0, this was relaxed to the logical block 
> size
>        of the block device (typically 512 bytes).   A  block  device's  
> logical
>        block  size  can be determined using the ioctl(2) BLKSSZGET operation 
> or
>        from the shell using the command:

That's really just descriptive, not prescriptive.

The intent of O_DIRECT is "bypass the page cache", the alignment
restrictions are just a side effect of that. Applications just care
about is having predictable performance characteristics.

> > The catch is that struct bio - bvec_iter - represents addresses with a
> > sector_t, and we'd want that to be a loff_t.
> > 
> > That's something we should do anyways; everything else in struct bio can
> > represent a byte-aligned io, bvec_iter.bi_sector is the only exception
> > and fixing that would help in consolidating our various scatter-gather
> > list data structures - but we'd need buy-in from Jens and Christoph
> > before doing that.
> 
> I'm against it.  Block devices only do sector-aligned IO and we should
> not pretend otherwise.

Eh?

bio isn't really specific to the block layer anyways, given that an
iov_iter can be a bio underneath. We _really_ should be trying for
better commonality of data structures.

Reply via email to