Re: [f2fs-dev] f2fs Crash Consistency Problem

Jaegeuk Kim Mon, 15 May 2017 10:47:14 -0700

Hi,

On 05/12, Raouf Rokhjavan wrote:
> Hi,
> 
> On 05/12/17 04:44, Jaegeuk Kim wrote:
> > Hi,
> > 
> > On 05/10, Raouf Rokhjavan wrote:
> > ...
> > 
> > > As you told to use snapshot mechanism to prevent changing ckpt number 
> > > after
> > > each mount, I ran again generic tests of xfstests framework on top of
> > > log-writes target with f2fs file system. In order to automate reporting an
> > > inconsistency situation, I add a parameter to fsck.f2fs to return(-1) when
> > > c.bug_on condition is met. To evaluate how f2fs react in case of crash
> > > consistency, I replay each log and check the consistency of f2fs with a my
> > > own modified version of fsck.f2fs.  Accordingly, all tests passed smoothly
> > > except these tests:
> > > 
> > > [FAIL] Running generic/013 failed. (consistency_single)
> > Could you check whether any IO made by mkfs was added in the replay log?
> > If so, fsck.f2fs should be failed when replaying them.
> These two functions undertake the whole logic which run different tests
> including sysbench dbtests.
> 
> apply_test()
> {
>     local aTest
>     aTest=$1
> 
>     # Create log-writes
>     TABLE="0 ${BLKSIZE} log-writes ${REPLAYDEV} ${LOGDEV}"
>     dmsetup create ${DEVMAP_LOG_WRITES} --table "${TABLE}"
> 
>     # Mark mkfs
>     dmsetup message ${DEVMAP_LOG_WRITES} 0 mark mkfs_start
>     ${TOOLS_DIR}/mkfs.f2fs ${DEVMAP_PART}
>     dmsetup message ${DEVMAP_LOG_WRITES} 0 mark mkfs_end
> 
>     # Apply the test and mark it
>     dmsetup message ${DEVMAP_LOG_WRITES} 0 mark ${aTest}_start
>     ./check -E ./${CCTESTS_EXCLUDE} -s ${FSTYPE}_consistency ${aTest}


Here, what I did was I added the above start mark in the test script directly.
For example, I added it just before running fsstress in tests/generic/070, since
each test is able to call mkfs which wipes the device with log entries. In that
case, fsck would not find superblock, resulting in a failure.

Thanks,

>     dmsetup message ${DEVMAP_LOG_WRITES} 0 mark ${aTest}_end
> 
>     # Remove log-writes
>     dmsetup remove ${DEVMAP_LOG_WRITES}
> }
> 
> apply_consistency_test()
> {
>     local aTest
>     aTest=$1
> 
>     echo "Replaying mkfs ..."
>     ENTRY=$(${TOOLS_DIR}/replay-log --log $LOGDEV --find --end-mark
> mkfs_end)
>     ${TOOLS_DIR}/replay-log --log $LOGDEV --replay $REPLAYDEV --end-mark
> mkfs_end
> 
>     START_ENTRY=$(${TOOLS_DIR}/replay-log --log $LOGDEV --find --end-mark
> ${aTest}_start)
>     END_ENTRY=$(${TOOLS_DIR}/replay-log --log $LOGDEV --find --end-mark
> ${aTest}_end)
> 
>     echo "Replaying test $aTest ..."
>     echo "START_ENTRY is $START_ENTRY"
>     echo "END_ENTRY is $END_ENTRY"
> 
>     ${TOOLS_DIR}/replay-log -v --log $LOGDEV --replay $REPLAYDEV
> --start-mark ${aTest}_start\
>         --fsck "$FSCK_SCRIPT" --check 1
> }
> 
> First I call apply_test() and then apply_consistency_test().
> I ran generic test  #013 a couple of times, but It failed haphazardly; what
> I mean is that it fails one time and passes another time! In the case of
> sysbench tests, the same thing happened. I wonder whether the order of
> replaying IOs can cause this behavior or not.
> > > [FAIL] Running generic/070 failed. (consistency_single)
> > > [FAIL] Running generic/113 failed. (consistency_single)
> > I added a mark to replay in the beginning of generic/113, and ran the test.
> > But, I couldn't find any error given test_dev as a log_dev. (I tested this
> > in the latest f2fs/dev-test branch.)
> > 
> > > [FAIL] Running generic/241 failed. (consistency_single)
> > > 
> > > In other words, in these tests, c.bug_on() was true. Would you please
> > > describe why they become inconsistent?
> > > 
> > > Besides, I ran sysbench for database benchmark with 1 thread, 1000 
> > > records,
> > > and 100 transactions on top of log-writes target with f2fs. 
> > > Interestingly, I
> > > encountered a weird inconsistency. After replaying about 100 logs, 
> > > fsck.f2fs
> > > complains about inconsistency with the following messages:
> > Can you share the parameter for sysbench?
> After starting the mariadb and copying /var/lib/mysql contents into f2fs
> partition, These commands are run:
> 
> #sysbench 
> --test=/home/roraouf/Projects/CrashConsistencyTest/locals/var/lib/dbtests/sysbench-lua/tests/db/oltp_write_only.lua
> --db-driver=mysql --oltp-table-size=1000 --mysql-db=sysbench
> --mysql-socket=/mnt/crash_consistency/f2fs/mysql/mysql.sock
> --mysql-user=sysbench --mysql-password=password prepare
> 
> #sysbench 
> --test=/home/roraouf/Projects/CrashConsistencyTest/locals/var/lib/dbtests/sysbench-lua/tests/db/oltp_write_only.lua
> --db-driver=mysql --oltp-table-size=1000 --mysql-db=sysbench
> --mysql-user=sysbench --mysql-password=password --max-requests=100
> --num-threads=1 --mysql-socket=/mnt/crash_consistency/f2fs/mysql/mysql.sock
> run
> 
> #sysbench 
> --test=/home/roraouf/Projects/CrashConsistencyTest/locals/var/lib/dbtests/sysbench-lua/tests/db/oltp_write_only.lua
> --db-driver=mysql --mysql-db=sysbench --mysql-user=sysbench
> --mysql-password=password
> --mysql-socket=/mnt/crash_consistency/f2fs/mysql/mysql.sock cleanup
> 
> These steps are done in a script file.
> Honestly speaking, I couldn't find any form of pattern to reproduce these
> errors.
> 
> The most intriguing part of these tests which is in common between all of
> them is that after fsck.f2fs pops out a lot messages on the screen and I
> cancel its execution with accepting fix partition, it doesn't generate the
> same output again and says every thing goes well with [ok] !!! Why does it
> behave in this way?
> 
> Regards
> > 
> > Thanks,
> > 
> > > Info: Segments per section = 1
> > > Info: Sections per zone = 1
> > > Info: sector size = 512
> > > Info: total sectors = 2097152 (1024 MB)
> > > Info: MKFS version
> > >    "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version 4.8.5
> > > 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST 2017"
> > > Info: FSCK version
> > >    from "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version
> > > 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST
> > > 2017"
> > >      to "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version
> > > 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST
> > > 2017"
> > > Info: superblock features = 0 :
> > > Info: superblock encrypt level = 0, salt = 
> > > 00000000000000000000000000000000
> > > Info: total FS sectors = 2097152 (1024 MB)
> > > Info: CKPT version = 2b59c128
> > > Info: checkpoint state = 44 :  compacted_summary sudden-power-off
> > > [ASSERT] (sanity_check_nid: 388)  --> nid[0x6] nat_entry->ino[0x6]
> > > footer.ino[0x0]
> > > 
> > > NID[0x6] is unreachable
> > > NID[0x7] is unreachable
> > > [FSCK] Unreachable nat entries                        [Fail] [0x2]
> > > [FSCK] SIT valid block bitmap checking                [Fail]
> > > [FSCK] Hard link checking for regular file            [Ok..] [0x0]
> > > [FSCK] valid_block_count matching with CP             [Fail] [0x6dc9]
> > > [FSCK] valid_node_count matcing with CP (de lookup)   [Fail] [0xe3]
> > > [FSCK] valid_node_count matcing with CP (nat lookup)  [Ok..] [0xe5]
> > > [FSCK] valid_inode_count matched with CP              [Fail] [0x63]
> > > [FSCK] free segment_count matched with CP             [Ok..] [0x1c6]
> > > [FSCK] next block offset is free                      [Ok..]
> > > [FSCK] fixing SIT types
> > > [FSCK] other corrupted bugs                           [Fail]
> > > 
> > > After canceling the test by using Ctrl-C without answering any YES/NO
> > > questions, on another terminal I run fsck.f2fs again, but the output is
> > > completely different:
> > > [root@localhost CrashConsistencyTest]# ./locals/usr/local/sbin/fsck.f2fs
> > > /dev/sdc
> > > Info: [/dev/sdc] Disk Model: VMware Virtual S1.0
> > > Info: Segments per section = 1
> > > Info: Sections per zone = 1
> > > Info: sector size = 512
> > > Info: total sectors = 2097152 (1024 MB)
> > > Info: MKFS version
> > >    "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version 4.8.5
> > > 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST 2017"
> > > Info: FSCK version
> > >    from "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version
> > > 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST
> > > 2017"
> > >      to "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version
> > > 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST
> > > 2017"
> > > Info: superblock features = 0 :
> > > Info: superblock encrypt level = 0, salt = 
> > > 00000000000000000000000000000000
> > > Info: total FS sectors = 2097152 (1024 MB)
> > > Info: CKPT version = 2b59c128
> > > Info: checkpoint state = 44 :  compacted_summary sudden-power-off
> > > 
> > > [FSCK] Unreachable nat entries                        [Ok..] [0x0]
> > > [FSCK] SIT valid block bitmap checking                [Ok..]
> > > [FSCK] Hard link checking for regular file            [Ok..] [0x0]
> > > [FSCK] valid_block_count matching with CP             [Ok..] [0x6dcf]
> > > [FSCK] valid_node_count matcing with CP (de lookup)   [Ok..] [0xe5]
> > > [FSCK] valid_node_count matcing with CP (nat lookup)  [Ok..] [0xe5]
> > > [FSCK] valid_inode_count matched with CP              [Ok..] [0x64]
> > > [FSCK] free segment_count matched with CP             [Ok..] [0x1c6]
> > > [FSCK] next block offset is free                      [Ok..]
> > > [FSCK] fixing SIT types
> > > [FSCK] other corrupted bugs                           [Ok..]
> > > 
> > > This situation raises a couple of questions:
> > > 1. How  does an inconsistent file system turn into a consistent one in 
> > > this
> > > case?
> > > 2. Why does an inconsistency occur in different log numbers; in other 
> > > words,
> > > why is it unpredictable?  Does ordering of logs have to do with disk
> > > controller and I/O scheduler?
> > > 
> > > I do appreciate for your help.
> > > Regards

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [f2fs-dev] f2fs Crash Consistency Problem

Reply via email to