+ Shin'Ichiro On 2019/11/26 15:19, Damien Le Moal wrote: > On 2019/11/26 12:58, Javier González wrote: >> On 26.11.2019 02:06, Damien Le Moal wrote: >>> On 2019/11/26 4:03, Javier González wrote: >>>> On 25.11.2019 00:48, Damien Le Moal wrote: >>>>> On 2019/11/22 18:00, Javier González wrote: >>>>>> From: Javier González <javier.g...@samsung.com> >>>>>> >>>>>> Fix file system corruption when using LFS mount (e.g., in zoned >>>>>> devices). Seems like the fallback into buffered I/O creates an >>>>>> inconsistency if the application is assuming both read and write DIO. I >>>>>> can easily reproduce a corruption with a simple RocksDB test. >>>>>> >>>>>> Might be that the f2fs_forced_buffered_io path brings some problems too, >>>>>> but I have not seen other failures besides this one. >>>>>> >>>>>> Problem reproducible without a zoned block device, simply by forcing >>>>>> LFS mount: >>>>>> >>>>>> $ sudo mkfs.f2fs -f -m /dev/nvme0n1 >>>>>> $ sudo mount /dev/nvme0n1 /mnt/f2fs >>>>>> $ sudo /opt/rocksdb/db_bench --benchmarks=fillseq --use_existing_db=0 >>>>>> --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true >>>>>> --db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1 >>>>>> --block_size=65536 >>>>>> >>>>>> Note that the options that cause the problem are: >>>>>> --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true >>>>>> >>>>>> Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write") >>>>>> >>>>>> Signed-off-by: Javier González <javier.g...@samsung.com> >>>>>> --- >>>>>> fs/f2fs/data.c | 3 --- >>>>>> 1 file changed, 3 deletions(-) >>>>>> >>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>>>>> index 5755e897a5f0..b045dd6ab632 100644 >>>>>> --- a/fs/f2fs/data.c >>>>>> +++ b/fs/f2fs/data.c >>>>>> @@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, >>>>>> struct iov_iter *from) >>>>>> return err; >>>>>> } >>>>>> >>>>>> - if (direct_io && allow_outplace_dio(inode, iocb, from)) >>>>>> - return 0; >>>>> >>>>> Since for LFS mode, all DIOs can end up out of place, I think that it >>>>> may be better to change allow_outplace_dio() to always return true in >>>>> the case of LFS mode. So may be something like: >>>>> >>>>> static inline int allow_outplace_dio(struct inode *inode, >>>>> struct kiocb *iocb, struct iov_iter *iter) >>>>> { >>>>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >>>>> int rw = iov_iter_rw(iter); >>>>> >>>>> return test_opt(sbi, LFS) || >>>>> (rw == WRITE && !block_unaligned_IO(inode, iocb, iter)); >>>>> } >>>>> >>>>> instead of the original: >>>>> >>>>> static inline int allow_outplace_dio(struct inode *inode, >>>>> struct kiocb *iocb, struct iov_iter *iter) >>>>> { >>>>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >>>>> int rw = iov_iter_rw(iter); >>>>> >>>>> return (test_opt(sbi, LFS) && (rw == WRITE) && >>>>> !block_unaligned_IO(inode, iocb, iter)); >>>>> } >>>>> >>>>> Thoughts ? >>>>> >>>> >>>> I see what you mean and it makes sense. However, the problem I am seeing >>>> occurs when allow_outplace_dio() returns true, as this is what creates >>>> the inconsistency between the write being buffered and the read being >>>> DIO. >>> >>> But if the write is switched to buffered, the DIO read should use the >>> buffered path too, no ? Since this is all happening under VFS, the >>> generic DIO read path will not ensure that the buffered writes are >>> flushed to disk before issuing the direct read, I think. So that would >>> explain your data corruption, i.e. you are reading stale data on the >>> device before the buffered writes make it to the media. >>> >> >> As far as I can see, the read is always sent DIO, so yes, I also believe >> that we are reading stale data. This is why the corruption is not seen >> if preventing allow_outplace_dio() from sending the write to the >> buffered path. >> >> What surprises me is that this is very easy to trigger (see commit), so >> I assume you must have seen this with SMR in the past. > > We just did. Shin'Ichiro in my team finally succeeded in recreating the > problem. The cause seems to be: > > bool direct_io = iocb->ki_flags & IOCB_DIRECT; > > being true on entry of f2fs_preallocate_blocks() whereas > f2fs_direct_IO() forces buffered IO path for DIO on zoned devices with: > > if (f2fs_force_buffered_io(inode, iocb, iter)) > return 0; > > which has: > > if (f2fs_sb_has_blkzoned(sbi)) > return true; > > So the top DIO code says "do buffered IOs", but lower in the write path, > the IO is still assumed to be a DIO because of the iocb flag... That's > inconsistent. > > Note that for the non-zoned device LFS case, f2fs_force_buffered_io() > returns true only for unaligned write DIOs... But that will still trip > on the iocb flag test. So the proper fix is likely something like: > > int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) > { > struct inode *inode = file_inode(iocb->ki_filp); > struct f2fs_map_blocks map; > int flag; > int err = 0; > - bool direct_io = iocb->ki_flags & IOCB_DIRECT; > + bool direct_io = (iocb->ki_flags & IOCB_DIRECT) && > + !2fs_force_buffered_io(inode, iocb, iter); > > /* convert inline data for Direct I/O*/ > if (direct_io) { > err = f2fs_convert_inline_inode(inode); > if (err) > return err; > } > > Shin'Ichiro tried this on SMR disks and the failure is gone... > > Cheers. > > >> >> Does it make sense to leave the LFS check out of the >> allow_outplace_dio()? Or in other words, is there a hard requirement for >> writes to take this path on a zoned device that I am not seeing? >> Something like: >> >> static inline int allow_outplace_dio(struct inode *inode, >> struct kiocb *iocb, struct iov_iter *iter) >> { >> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >> int rw = iov_iter_rw(iter); >> >> return (rw == WRITE && !block_unaligned_IO(inode, iocb, iter)); >> } >> >> Thanks, >> Javier >> > >
-- Damien Le Moal Western Digital Research _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel