Hi, On 05/12/17 04:44, Jaegeuk Kim wrote: > Hi, > > On 05/10, Raouf Rokhjavan wrote: > ... > >> As you told to use snapshot mechanism to prevent changing ckpt number after >> each mount, I ran again generic tests of xfstests framework on top of >> log-writes target with f2fs file system. In order to automate reporting an >> inconsistency situation, I add a parameter to fsck.f2fs to return(-1) when >> c.bug_on condition is met. To evaluate how f2fs react in case of crash >> consistency, I replay each log and check the consistency of f2fs with a my >> own modified version of fsck.f2fs. Accordingly, all tests passed smoothly >> except these tests: >> >> [FAIL] Running generic/013 failed. (consistency_single) > Could you check whether any IO made by mkfs was added in the replay log? > If so, fsck.f2fs should be failed when replaying them. These two functions undertake the whole logic which run different tests including sysbench dbtests.
apply_test() { local aTest aTest=$1 # Create log-writes TABLE="0 ${BLKSIZE} log-writes ${REPLAYDEV} ${LOGDEV}" dmsetup create ${DEVMAP_LOG_WRITES} --table "${TABLE}" # Mark mkfs dmsetup message ${DEVMAP_LOG_WRITES} 0 mark mkfs_start ${TOOLS_DIR}/mkfs.f2fs ${DEVMAP_PART} dmsetup message ${DEVMAP_LOG_WRITES} 0 mark mkfs_end # Apply the test and mark it dmsetup message ${DEVMAP_LOG_WRITES} 0 mark ${aTest}_start ./check -E ./${CCTESTS_EXCLUDE} -s ${FSTYPE}_consistency ${aTest} dmsetup message ${DEVMAP_LOG_WRITES} 0 mark ${aTest}_end # Remove log-writes dmsetup remove ${DEVMAP_LOG_WRITES} } apply_consistency_test() { local aTest aTest=$1 echo "Replaying mkfs ..." ENTRY=$(${TOOLS_DIR}/replay-log --log $LOGDEV --find --end-mark mkfs_end) ${TOOLS_DIR}/replay-log --log $LOGDEV --replay $REPLAYDEV --end-mark mkfs_end START_ENTRY=$(${TOOLS_DIR}/replay-log --log $LOGDEV --find --end-mark ${aTest}_start) END_ENTRY=$(${TOOLS_DIR}/replay-log --log $LOGDEV --find --end-mark ${aTest}_end) echo "Replaying test $aTest ..." echo "START_ENTRY is $START_ENTRY" echo "END_ENTRY is $END_ENTRY" ${TOOLS_DIR}/replay-log -v --log $LOGDEV --replay $REPLAYDEV --start-mark ${aTest}_start\ --fsck "$FSCK_SCRIPT" --check 1 } First I call apply_test() and then apply_consistency_test(). I ran generic test #013 a couple of times, but It failed haphazardly; what I mean is that it fails one time and passes another time! In the case of sysbench tests, the same thing happened. I wonder whether the order of replaying IOs can cause this behavior or not. >> [FAIL] Running generic/070 failed. (consistency_single) >> [FAIL] Running generic/113 failed. (consistency_single) > I added a mark to replay in the beginning of generic/113, and ran the test. > But, I couldn't find any error given test_dev as a log_dev. (I tested this > in the latest f2fs/dev-test branch.) > >> [FAIL] Running generic/241 failed. (consistency_single) >> >> In other words, in these tests, c.bug_on() was true. Would you please >> describe why they become inconsistent? >> >> Besides, I ran sysbench for database benchmark with 1 thread, 1000 records, >> and 100 transactions on top of log-writes target with f2fs. Interestingly, I >> encountered a weird inconsistency. After replaying about 100 logs, fsck.f2fs >> complains about inconsistency with the following messages: > Can you share the parameter for sysbench? After starting the mariadb and copying /var/lib/mysql contents into f2fs partition, These commands are run: #sysbench --test=/home/roraouf/Projects/CrashConsistencyTest/locals/var/lib/dbtests/sysbench-lua/tests/db/oltp_write_only.lua --db-driver=mysql --oltp-table-size=1000 --mysql-db=sysbench --mysql-socket=/mnt/crash_consistency/f2fs/mysql/mysql.sock --mysql-user=sysbench --mysql-password=password prepare #sysbench --test=/home/roraouf/Projects/CrashConsistencyTest/locals/var/lib/dbtests/sysbench-lua/tests/db/oltp_write_only.lua --db-driver=mysql --oltp-table-size=1000 --mysql-db=sysbench --mysql-user=sysbench --mysql-password=password --max-requests=100 --num-threads=1 --mysql-socket=/mnt/crash_consistency/f2fs/mysql/mysql.sock run #sysbench --test=/home/roraouf/Projects/CrashConsistencyTest/locals/var/lib/dbtests/sysbench-lua/tests/db/oltp_write_only.lua --db-driver=mysql --mysql-db=sysbench --mysql-user=sysbench --mysql-password=password --mysql-socket=/mnt/crash_consistency/f2fs/mysql/mysql.sock cleanup These steps are done in a script file. Honestly speaking, I couldn't find any form of pattern to reproduce these errors. The most intriguing part of these tests which is in common between all of them is that after fsck.f2fs pops out a lot messages on the screen and I cancel its execution with accepting fix partition, it doesn't generate the same output again and says every thing goes well with [ok] !!! Why does it behave in this way? Regards > > Thanks, > >> Info: Segments per section = 1 >> Info: Sections per zone = 1 >> Info: sector size = 512 >> Info: total sectors = 2097152 (1024 MB) >> Info: MKFS version >> "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version 4.8.5 >> 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST 2017" >> Info: FSCK version >> from "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version >> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST >> 2017" >> to "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version >> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST >> 2017" >> Info: superblock features = 0 : >> Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000 >> Info: total FS sectors = 2097152 (1024 MB) >> Info: CKPT version = 2b59c128 >> Info: checkpoint state = 44 : compacted_summary sudden-power-off >> [ASSERT] (sanity_check_nid: 388) --> nid[0x6] nat_entry->ino[0x6] >> footer.ino[0x0] >> >> NID[0x6] is unreachable >> NID[0x7] is unreachable >> [FSCK] Unreachable nat entries [Fail] [0x2] >> [FSCK] SIT valid block bitmap checking [Fail] >> [FSCK] Hard link checking for regular file [Ok..] [0x0] >> [FSCK] valid_block_count matching with CP [Fail] [0x6dc9] >> [FSCK] valid_node_count matcing with CP (de lookup) [Fail] [0xe3] >> [FSCK] valid_node_count matcing with CP (nat lookup) [Ok..] [0xe5] >> [FSCK] valid_inode_count matched with CP [Fail] [0x63] >> [FSCK] free segment_count matched with CP [Ok..] [0x1c6] >> [FSCK] next block offset is free [Ok..] >> [FSCK] fixing SIT types >> [FSCK] other corrupted bugs [Fail] >> >> After canceling the test by using Ctrl-C without answering any YES/NO >> questions, on another terminal I run fsck.f2fs again, but the output is >> completely different: >> [root@localhost CrashConsistencyTest]# ./locals/usr/local/sbin/fsck.f2fs >> /dev/sdc >> Info: [/dev/sdc] Disk Model: VMware Virtual S1.0 >> Info: Segments per section = 1 >> Info: Sections per zone = 1 >> Info: sector size = 512 >> Info: total sectors = 2097152 (1024 MB) >> Info: MKFS version >> "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version 4.8.5 >> 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST 2017" >> Info: FSCK version >> from "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version >> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST >> 2017" >> to "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version >> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST >> 2017" >> Info: superblock features = 0 : >> Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000 >> Info: total FS sectors = 2097152 (1024 MB) >> Info: CKPT version = 2b59c128 >> Info: checkpoint state = 44 : compacted_summary sudden-power-off >> >> [FSCK] Unreachable nat entries [Ok..] [0x0] >> [FSCK] SIT valid block bitmap checking [Ok..] >> [FSCK] Hard link checking for regular file [Ok..] [0x0] >> [FSCK] valid_block_count matching with CP [Ok..] [0x6dcf] >> [FSCK] valid_node_count matcing with CP (de lookup) [Ok..] [0xe5] >> [FSCK] valid_node_count matcing with CP (nat lookup) [Ok..] [0xe5] >> [FSCK] valid_inode_count matched with CP [Ok..] [0x64] >> [FSCK] free segment_count matched with CP [Ok..] [0x1c6] >> [FSCK] next block offset is free [Ok..] >> [FSCK] fixing SIT types >> [FSCK] other corrupted bugs [Ok..] >> >> This situation raises a couple of questions: >> 1. How does an inconsistent file system turn into a consistent one in this >> case? >> 2. Why does an inconsistency occur in different log numbers; in other words, >> why is it unpredictable? Does ordering of logs have to do with disk >> controller and I/O scheduler? >> >> I do appreciate for your help. >> Regards ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel