On 2019/11/26 12:58, Javier González wrote: > On 26.11.2019 02:06, Damien Le Moal wrote: >> On 2019/11/26 4:03, Javier González wrote: >>> On 25.11.2019 00:48, Damien Le Moal wrote: >>>> On 2019/11/22 18:00, Javier González wrote: >>>>> From: Javier González <javier.g...@samsung.com> >>>>> >>>>> Fix file system corruption when using LFS mount (e.g., in zoned >>>>> devices). Seems like the fallback into buffered I/O creates an >>>>> inconsistency if the application is assuming both read and write DIO. I >>>>> can easily reproduce a corruption with a simple RocksDB test. >>>>> >>>>> Might be that the f2fs_forced_buffered_io path brings some problems too, >>>>> but I have not seen other failures besides this one. >>>>> >>>>> Problem reproducible without a zoned block device, simply by forcing >>>>> LFS mount: >>>>> >>>>> $ sudo mkfs.f2fs -f -m /dev/nvme0n1 >>>>> $ sudo mount /dev/nvme0n1 /mnt/f2fs >>>>> $ sudo /opt/rocksdb/db_bench --benchmarks=fillseq --use_existing_db=0 >>>>> --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true >>>>> --db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1 >>>>> --block_size=65536 >>>>> >>>>> Note that the options that cause the problem are: >>>>> --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true >>>>> >>>>> Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write") >>>>> >>>>> Signed-off-by: Javier González <javier.g...@samsung.com> >>>>> --- >>>>> fs/f2fs/data.c | 3 --- >>>>> 1 file changed, 3 deletions(-) >>>>> >>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>>>> index 5755e897a5f0..b045dd6ab632 100644 >>>>> --- a/fs/f2fs/data.c >>>>> +++ b/fs/f2fs/data.c >>>>> @@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, >>>>> struct iov_iter *from) >>>>> return err; >>>>> } >>>>> >>>>> - if (direct_io && allow_outplace_dio(inode, iocb, from)) >>>>> - return 0; >>>> >>>> Since for LFS mode, all DIOs can end up out of place, I think that it >>>> may be better to change allow_outplace_dio() to always return true in >>>> the case of LFS mode. So may be something like: >>>> >>>> static inline int allow_outplace_dio(struct inode *inode, >>>> struct kiocb *iocb, struct iov_iter *iter) >>>> { >>>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >>>> int rw = iov_iter_rw(iter); >>>> >>>> return test_opt(sbi, LFS) || >>>> (rw == WRITE && !block_unaligned_IO(inode, iocb, iter)); >>>> } >>>> >>>> instead of the original: >>>> >>>> static inline int allow_outplace_dio(struct inode *inode, >>>> struct kiocb *iocb, struct iov_iter *iter) >>>> { >>>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >>>> int rw = iov_iter_rw(iter); >>>> >>>> return (test_opt(sbi, LFS) && (rw == WRITE) && >>>> !block_unaligned_IO(inode, iocb, iter)); >>>> } >>>> >>>> Thoughts ? >>>> >>> >>> I see what you mean and it makes sense. However, the problem I am seeing >>> occurs when allow_outplace_dio() returns true, as this is what creates >>> the inconsistency between the write being buffered and the read being >>> DIO. >> >> But if the write is switched to buffered, the DIO read should use the >> buffered path too, no ? Since this is all happening under VFS, the >> generic DIO read path will not ensure that the buffered writes are >> flushed to disk before issuing the direct read, I think. So that would >> explain your data corruption, i.e. you are reading stale data on the >> device before the buffered writes make it to the media. >> > > As far as I can see, the read is always sent DIO, so yes, I also believe > that we are reading stale data. This is why the corruption is not seen > if preventing allow_outplace_dio() from sending the write to the > buffered path. > > What surprises me is that this is very easy to trigger (see commit), so > I assume you must have seen this with SMR in the past.
We just did. Shin'Ichiro in my team finally succeeded in recreating the problem. The cause seems to be: bool direct_io = iocb->ki_flags & IOCB_DIRECT; being true on entry of f2fs_preallocate_blocks() whereas f2fs_direct_IO() forces buffered IO path for DIO on zoned devices with: if (f2fs_force_buffered_io(inode, iocb, iter)) return 0; which has: if (f2fs_sb_has_blkzoned(sbi)) return true; So the top DIO code says "do buffered IOs", but lower in the write path, the IO is still assumed to be a DIO because of the iocb flag... That's inconsistent. Note that for the non-zoned device LFS case, f2fs_force_buffered_io() returns true only for unaligned write DIOs... But that will still trip on the iocb flag test. So the proper fix is likely something like: int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) { struct inode *inode = file_inode(iocb->ki_filp); struct f2fs_map_blocks map; int flag; int err = 0; - bool direct_io = iocb->ki_flags & IOCB_DIRECT; + bool direct_io = (iocb->ki_flags & IOCB_DIRECT) && + !2fs_force_buffered_io(inode, iocb, iter); /* convert inline data for Direct I/O*/ if (direct_io) { err = f2fs_convert_inline_inode(inode); if (err) return err; } Shin'Ichiro tried this on SMR disks and the failure is gone... Cheers. > > Does it make sense to leave the LFS check out of the > allow_outplace_dio()? Or in other words, is there a hard requirement for > writes to take this path on a zoned device that I am not seeing? > Something like: > > static inline int allow_outplace_dio(struct inode *inode, > struct kiocb *iocb, struct iov_iter *iter) > { > struct f2fs_sb_info *sbi = F2FS_I_SB(inode); > int rw = iov_iter_rw(iter); > > return (rw == WRITE && !block_unaligned_IO(inode, iocb, iter)); > } > > Thanks, > Javier > -- Damien Le Moal Western Digital Research _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel