On 3/5/2019 10:58 AM, Kevin Locke wrote: > Sounds great. How do you propose that the kernel determine the > optimal alignment?
md does it using the stripe size. Not sure if anything other the md or dm would make sense to populate the value. Well, I guess hardware raid drivers. > I disagree that what you quoted says that that the md driver uses > optimal_io_size for anything, much less unconditionally. Also, since Now that I read it again, it just says raid arrays, not md specifically, so I guess that means md plus hardware raid drivers. I know from experience that md does use it that way. > the disk on which I am running parted is not a RAID array, I don't > think the documentation above says that it is anything more than > "preferred unit for sustained I/O". Yes, the first part says that, but then it goes on to say that normal disks generally leave it zero, and raid disks set it to the stripe width. >> Wait, how can optimal_io_size NOT be a multiple of the block size? > > For my disk, the logical block size is 512 bytes, the physical block > size is 4,096, opt_xfer_blocks is 65,535, so optimal_io_size is > 65,535*512=33,553,920 which is not a multiple of 4,096. I considered > advocating that the kernel check this, but decided against it. Oh, that is weird. I guess such a sanity check would fix the issue for your USB stick, but what about others? > SCSI devices can report any value (measured in logical blocks) for VPD > Optimal Transfer Length. It is not restricted to multiples of the > physical block size. For my disk, it is not, which is the cause of > the current issue. So for 512e disks basically, the optimal transfer length can be not a multiple of physical block size and foolish drives try to specify the maximum possible value in logical 512 byte sectors, and that ends up being 1 logical sector too small to align to 4k. For 512n 4kn disks, the optimal size can never not be a multiple of the sector size, so the sanity check would pass and still give you a massive alignment you don't want.

