Reply Sashiko comment:
> https://sashiko.dev/#/patchset/[email protected]
>
> > + if (writesize % dio_align != 0) {
> + ksft_test_result_skip("DIO alignment (%u) incompatible with
> offset %zu\n",
> + dio_align, writesize);
> + return;
> + }
>
> Is this alignment check complete?
>
> Direct I/O requires both the transfer length and the memory buffer address
> to be aligned. Later in this function, start_off is used as the buffer offset:
> buffer = orig_buffer;
> buffer += start_off;
> If start_off is pagesize / 2 (e.g., 2048) and writesize is pagesize * 3
> (e.g., 12288), writesize is a multiple of a 4096-byte alignment, so the test
> is not skipped.
>
> However, the memory buffer itself is only 2048-byte aligned. Will the
> subsequent write() still fail with -EINVAL on 4K-sector devices?
TL;DR: Yes, we should do both buffer address and writesize alignment
checks to satisfy all FS types.
Looking at the kernel code: fs/iomap/direct-dio.c, the only alignment
check there is at line#413, which checks file's pos and write length.
EXT4:
ext4_file_write_iter
ext4_dio_write_iter
iomap_dio_rw
__iomap_dio_rw
iomap_dio_iter
iomap_dio_bio_iter
390 static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio
*dio)
391 {
...
403
404 /*
405 * File systems that write out of place and always allocate new
blocks
406 * need each bio to be block aligned as that's the unit of
allocation.
407 */
408 if (dio->flags & IOMAP_DIO_FSBLOCK_ALIGNED)
409 alignment = fs_block_size;
410 else
411 alignment = bdev_logical_block_size(iomap->bdev);
412
413 if ((pos | length) & (alignment - 1))
414 return -EINVAL;
415 ...
Sashiko points out the buffer-address should do alignment check as well,
I firstly suspect it based on the FS extra check before the iomap_dio_rw:
ext4_file_write_iter
ext4_dio_write_iter
ext4_should_use_dio
iov_iter_alignment <--- do buffer/writesize alignment check
842 unsigned long iov_iter_alignment(const struct iov_iter *i)
843 {
...
853 return iov_iter_alignment_iovec(i);
...
865 }
799 static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i)
800 {
...
809 res |= (unsigned long)iov->iov_base + skip;
812 res |= len;
...
818 return res;
819 }
But eventually I found that this is only fallback to the buffer I/O
when the direct I/O is unsupported (go to: ext4_buffered_write_iter).
This wouldn't happen in the test as it open with O_DIRECT flag.
Then, I turned to look at Btrfs path:
btrfs_file_write_iter
btrfs_do_write_iter
btrfs_direct_write
check_direct_IO <--- do buffer alignment check
...
btrfs_dio_write
__iomap_dio_rw <--- do samething like ext4
778 static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
779 const struct iov_iter *iter, loff_t offset)
780 {
781 const u32 blocksize_mask = fs_info->sectorsize - 1;
782
783 if (offset & blocksize_mask)
784 return -EINVAL;
785
786 if (iov_iter_alignment(iter) & blocksize_mask)
787 return -EINVAL;
788 return 0;
789 }
Yes, here I found the evendice that iov_iter_alignment(iter) & blocksize_mask)
do the alignment check.
Unlike ext4 which never reaches the check for normal files, btrfs always checks
buffer alignment for every DIO operation. And it's a hard -EINVAL, not a silent
fallback to buffered I/O.
--
Regards,
Li Wang