Re: [f2fs-dev] f2fs Crash Consistency Problem

Raouf Rokhjavan Wed, 10 May 2017 10:51:37 -0700


On 04/14/17 01:49, Jaegeuk Kim wrote:
> Hello,
>
> On 04/13, Raouf Rokhjavan wrote:
>> Hi
>>
>> The Flash friendly features of f2fs has motivated me to make use of
>> these characteristics as rootfs in my project. Since one of my main
>> considerations is the resilience in the face of power failure, I've been
>> looking for some techniques to prove this assertion. Finally, I ended up
>> finding a wonderful device-mapper target, developed by Josef Bacik who
>> is btrfs developer, for this purpose..
>>
>> As you know, log-writes target logs all bios which are passed to the
>> block layer and keeps the order of logging to simulate the file system
>> logic of maintaining the consistency. To take advantage of this helpful
>> tool to verify the consistency of f2fs file system after power failure,
>> I combined xfstests test suite with log-writes. According to the LFS
>> based nature of f2fs, I expected that I would never encounter with
>> inconsistency problem, but test results shows something else.
>>
>> To clarify further this notion, this is my test environment:
>>
>>       - Fedora 24
>>
>>       - kernel 4.9.8 - f2fs was compiled as module which all features
>>
>>           - mount options: default + noatime
>>
>>       - f2fs-tools 1.8.0
>>
>>       - xfstests  1.1.1 - from https://github.com/jaegeuk/xfstests-f2fs
>>
>>       - device-mapper 1.02.122
>>
>> In my test environment, I run each generic test of xfstests on
>> log-writes device with newly created f2fs. After that, I replay the log
>> after the mkfs one by one and check the consistency of file system with
>> fsck.f2fs. In test #009 which is fallocate test with
>> FALLOC_FL_ZERO_RANGE mode, after a while, fsck.f2fs complains this:
> I just ran log-writer with fsstress and found one issue when replaying IOs.
> If you replay after mkfs.f2fs, you get the wrong valid checkpoint which was
> overwritten by previous run. IOWs, at the beginning of replay, there was
> no *correct* checkpoint representing that initial moment. So, I think you
> need to replay the log including mkfs.
>
> You can verify the below CKPT version info.
>
>> Info: [/dev/sdc] Disk Model: VMware Virtual S1.0
>> Info: Segments per section = 1
>> Info: Sections per zone = 1
>> Info: sector size = 512
>> Info: total sectors = 2097152 (1024 MB)
>> Info: MKFS version
>>     "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version
>> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST
>> 2017"
>> Info: FSCK version
>>     from "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version
>> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST
>> 2017"
>>       to "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version
>> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST
>> 2017"
>> Info: superblock features = 0 :
>> Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
>> Info: total FS sectors = 2097152 (1024 MB)
>> Info: CKPT version = 2a4679e0
>> Info: checkpoint state = 45 :  compacted_summary unmount
>>
>> NID[0x4c] is unreachable
>> NID[0x4d] is unreachable
>> [FSCK] Unreachable nat entries                        [Fail] [0x2]
>> [FSCK] SIT valid block bitmap checking                [Ok..]
>> [FSCK] Hard link checking for regular file            [Ok..] [0x0]
>> [FSCK] valid_block_count matching with CP             [Ok..] [0x2]
>> [FSCK] valid_node_count matcing with CP (de lookup)   [Ok..] [0x1]
>> [FSCK] valid_node_count matcing with CP (nat lookup)  [Fail] [0x3]
>> [FSCK] valid_inode_count matched with CP              [Ok..] [0x1]
>> [FSCK] free segment_count matched with CP             [Ok..] [0x1f0]
>> [FSCK] next block offset is free                      [Ok..]
>> [FSCK] fixing SIT types
>> [FSCK] other corrupted bugs                           [Fail]
>>
>> Do you want to restore lost files into ./lost_found/? [Y/N] Y
>>    - File name         : 009.48244.2
>>    - File size         : 20,480 (bytes)
>> Do you want to fix this partition? [Y/N] Y
>>
>> The interesting side of this is that when I issue fsck.f2fs with -p
>> option, fsck.f2fs doesn't complain !!!
>>
>> Would you please tell me why fsck.f2fs reports an inconsistency which
>> needs to be fixed? Does it violate the crash consistency promise of f2fs?
> As I mentioned above, I guess you did with "--start-mark mkfs" which will lose
> the initial checkpoint.
>
>> Moreover, Why is fsck.f2fs silent with -p option? Does it mean whether
>> f2fs kernel module finds it not serious?
> The -p [level] and default level is zero, which checks the image iif runtime
> f2fs reported any bug case before. Otherwise, it simply returns. If you set
> level 1, fsck.f2fs will check basic FS metadata parts.
>
> Thanks,
>
>> I really appreciate for your help.
>>
>> Thanks
>>
Hello,


As you told to use snapshot mechanism to prevent changing ckpt number 
after each mount, I ran again generic tests of xfstests framework on top 
of log-writes target with f2fs file system. In order to automate 
reporting an inconsistency situation, I add a parameter to fsck.f2fs to 
return(-1) when c.bug_on condition is met. To evaluate how f2fs react in 
case of crash consistency, I replay each log and check the consistency 
of f2fs with a my own modified version of fsck.f2fs.  Accordingly, all 
tests passed smoothly except these tests:

[FAIL] Running generic/013 failed. (consistency_single)
[FAIL] Running generic/070 failed. (consistency_single)
[FAIL] Running generic/113 failed. (consistency_single)
[FAIL] Running generic/241 failed. (consistency_single)

In other words, in these tests, c.bug_on() was true. Would you please 
describe why they become inconsistent?

Besides, I ran sysbench for database benchmark with 1 thread, 1000 
records, and 100 transactions on top of log-writes target with f2fs. 
Interestingly, I encountered a weird inconsistency. After replaying 
about 100 logs, fsck.f2fs complains about inconsistency with the 
following messages:

Info: Segments per section = 1
Info: Sections per zone = 1
Info: sector size = 512
Info: total sectors = 2097152 (1024 MB)
Info: MKFS version
   "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version 
4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST 
2017"
Info: FSCK version
   from "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version 
4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST 
2017"
     to "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version 
4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST 
2017"
Info: superblock features = 0 :
Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
Info: total FS sectors = 2097152 (1024 MB)
Info: CKPT version = 2b59c128
Info: checkpoint state = 44 :  compacted_summary sudden-power-off
[ASSERT] (sanity_check_nid: 388)  --> nid[0x6] nat_entry->ino[0x6] 
footer.ino[0x0]

NID[0x6] is unreachable
NID[0x7] is unreachable
[FSCK] Unreachable nat entries                        [Fail] [0x2]
[FSCK] SIT valid block bitmap checking                [Fail]
[FSCK] Hard link checking for regular file            [Ok..] [0x0]
[FSCK] valid_block_count matching with CP             [Fail] [0x6dc9]
[FSCK] valid_node_count matcing with CP (de lookup)   [Fail] [0xe3]
[FSCK] valid_node_count matcing with CP (nat lookup)  [Ok..] [0xe5]
[FSCK] valid_inode_count matched with CP              [Fail] [0x63]
[FSCK] free segment_count matched with CP             [Ok..] [0x1c6]
[FSCK] next block offset is free                      [Ok..]
[FSCK] fixing SIT types
[FSCK] other corrupted bugs                           [Fail]

After canceling the test by using Ctrl-C without answering any YES/NO 
questions, on another terminal I run fsck.f2fs again, but the output is 
completely different:
[root@localhost CrashConsistencyTest]# ./locals/usr/local/sbin/fsck.f2fs 
/dev/sdc
Info: [/dev/sdc] Disk Model: VMware Virtual S1.0
Info: Segments per section = 1
Info: Sections per zone = 1
Info: sector size = 512
Info: total sectors = 2097152 (1024 MB)
Info: MKFS version
   "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version 
4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST 
2017"
Info: FSCK version
   from "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version 
4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST 
2017"
     to "Linux version 4.9.8 (rora...@desktopr.example.com) (gcc version 
4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Feb 7 08:24:57 IRST 
2017"
Info: superblock features = 0 :
Info: superblock encrypt level = 0, salt = 00000000000000000000000000000000
Info: total FS sectors = 2097152 (1024 MB)
Info: CKPT version = 2b59c128
Info: checkpoint state = 44 :  compacted_summary sudden-power-off

[FSCK] Unreachable nat entries                        [Ok..] [0x0]
[FSCK] SIT valid block bitmap checking                [Ok..]
[FSCK] Hard link checking for regular file            [Ok..] [0x0]
[FSCK] valid_block_count matching with CP             [Ok..] [0x6dcf]
[FSCK] valid_node_count matcing with CP (de lookup)   [Ok..] [0xe5]
[FSCK] valid_node_count matcing with CP (nat lookup)  [Ok..] [0xe5]
[FSCK] valid_inode_count matched with CP              [Ok..] [0x64]
[FSCK] free segment_count matched with CP             [Ok..] [0x1c6]
[FSCK] next block offset is free                      [Ok..]
[FSCK] fixing SIT types
[FSCK] other corrupted bugs                           [Ok..]

This situation raises a couple of questions:
1. How  does an inconsistent file system turn into a consistent one in 
this case?
2. Why does an inconsistency occur in different log numbers; in other 
words, why is it unpredictable?  Does ordering of logs have to do with 
disk controller and I/O scheduler?

I do appreciate for your help.
Regards

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [f2fs-dev] f2fs Crash Consistency Problem

Reply via email to