On Thu, Jun 20, 2024 at 09:36:42AM -0400, Kent Overstreet wrote:
> On Thu, Jun 20, 2024 at 09:21:57PM +0800, Hongbo Li wrote:
> > Support fallback to buffered I/O if the operation being performed on
> > unaligned length or offset. This may change the behavior for direct
> > I/O in some cases.
> >
> > [Before]
> > For length which aligned with 256 bytes (not SECTOR aligned) will
> > read failed under direct I/O.
> >
> > [After]
> > For length which aligned with 256 bytes (not SECTOR aligned) will
> > read the data successfully under direct I/O because it will fallback
> > to buffer I/O.
This is against the O_DIRECT requirements.
O_DIRECT
The O_DIRECT flag may impose alignment restrictions on the length and
address of user-space buffers and the file offset of I/Os. In Linux
alignment restrictions vary by filesystem and kernel version and might
be absent entirely. The handling of misaligned O_DIRECT I/Os also
varies; they can either fail with EINVAL or fall back to buffered I/O.
Since Linux 6.1, O_DIRECT support and alignment restrictions for a file
can be queried using statx(2), using the STATX_DIOALIGN flag. Support
for STATX_DIOALIGN varies by filesystem; see statx(2).
Some filesystems provide their own interfaces for querying O_DIRECT
alignment restrictions, for example the XFS_IOC_DIOINFO operation in xf‐
sctl(3). STATX_DIOALIGN should be used instead when it is available.
If none of the above is available, then direct I/O support and alignment
restrictions can only be assumed from known characteristics of the
filesystem, the individual file, the underlying storage device(s), and
the kernel version. In Linux 2.4, most filesystems based on block de‐
vices require that the file offset and the length and memory address of
all I/O segments be multiples of the filesystem block size (typically
4096 bytes). In Linux 2.6.0, this was relaxed to the logical block size
of the block device (typically 512 bytes). A block device's logical
block size can be determined using the ioctl(2) BLKSSZGET operation or
from the shell using the command:
blockdev --getss
> The catch is that struct bio - bvec_iter - represents addresses with a
> sector_t, and we'd want that to be a loff_t.
>
> That's something we should do anyways; everything else in struct bio can
> represent a byte-aligned io, bvec_iter.bi_sector is the only exception
> and fixing that would help in consolidating our various scatter-gather
> list data structures - but we'd need buy-in from Jens and Christoph
> before doing that.
I'm against it. Block devices only do sector-aligned IO and we should
not pretend otherwise.