On Mon, 02 Sep 2013 17:27:44 +0400, Vyacheslav Dubeyko wrote:
> From: Vyacheslav Dubeyko <[email protected]>
> Subject: [PATCH] [CRITICAL] nilfs2: fix issue with race condition of
> competition between segments for dirty blocks
Before considering to apply this patch, please confirm that the
suspected issue, "dirty buffer_heads are added in several lists of
payload_buffers" you pointed out, is actually happening.
The easiest way for this is testing the following patch:
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index bd88a74..cab38db 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -668,6 +668,7 @@ static size_t nilfs_lookup_dirty_data_buffers(struct inode
*inode,
if (!buffer_dirty(bh))
continue;
get_bh(bh);
+ BUG_ON(!list_empty(&bh->b_assoc_buffers));
list_add_tail(&bh->b_assoc_buffers, listp);
ndirties++;
if (unlikely(ndirties >= nlimit)) {
@@ -701,6 +702,7 @@ static void nilfs_lookup_dirty_node_buffers(struct inode
*inode,
do {
if (buffer_dirty(bh)) {
get_bh(bh);
+
BUG_ON(!list_empty(&bh->b_assoc_buffers));
list_add_tail(&bh->b_assoc_buffers,
listp);
}
You can rewrite the above BUG_ON() check with the following debug code
if you want to get detail information on the buffer:
if (!list_empty(&bh->b_assoc_buffers)) {
/* dump information on bh */
}
As a workaround, we can avoid double registration to different payload
lists with list_empty(&bh->b_assoc_buffers) instead of adding
BH_async_write flag.
However, I think we should first narrow down the root cause of why the
issue "dirty buffer_heads are added in several lists of
payload_buffers" happens.
We designed the log writer so that the same buffer is not registered
to two different payload buffer lists at the same time. If it is not
satisfied, it's a bug.
What is the root cause of the "double registration" of a buffer head,
do you think?
Regards,
Ryusuke Konishi
> Many NILFS2 users were reported about strange file system corruption (for
> example):
>
> [129306.872119] NILFS: bad btree node (blocknr=185027): level = 0, flags =
> 0x0, nchildren = 768
> [129306.872127] NILFS error (device sda4): nilfs_bmap_last_key: broken bmap
> (inode number=11540)
>
> But such error messages are consequence of file system's issue that takes
> place
> more earlier. Fortunately, Jerome Poulin <[email protected]> and
> Anton Eliasson <[email protected]> were reported about another issue
> not so recently. These reports describe the issue with segctor thread's crash:
>
> [1677.310656] BUG: unable to handle kernel paging request at 0000000000004c83
> [1677.310683] IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
>
> [1677.311562] Call Trace:
> [1677.311575] [<ffffffffa024e4c5>] nilfs_segctor_do_construct+0xf25/0x1b20
> [nilfs2]
> [1677.311596] [<ffffffff81019d09>] ? sched_clock+0x9/0x10
> [1677.311614] [<ffffffffa024f3ab>] nilfs_segctor_construct+0x17b/0x290
> [nilfs2]
> [1677.311636] [<ffffffffa024f5e2>] nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
> [1677.311657] [<ffffffffa024f4c0>] ?nilfs_segctor_construct+0x290/0x290
> [nilfs2]
> [1677.311677] [<ffffffff8107cae0>] kthread+0xc0/0xd0
> [1677.311690] [<ffffffff8107ca20>] ? kthread_create_on_node+0x120/0x120
> [1677.311709] [<ffffffff816dd16c>] ret_from_fork+0x7c/0xb0
> [1677.311724] [<ffffffff8107ca20>] ? kthread_create_on_node+0x120/0x120
>
> These two issues have one reason. This reason can raise third issue too.
> Third issue results in hanging of segctor thread with eating of 100% CPU.
>
> REPRODUCING PATH:
> One of the possible way or the issue reproducing was described by
> Jérôme Poulin <[email protected]>:
>
> 1. init S to get to single user mode.
> 2. sysrq+E to make sure only my shell is running
> 3. start network-manager to get my wifi connection up
> 4. login as root and launch "screen"
> 5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
> 6. lscp | xz -9e > lscp.txt.xz
> 7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root
> /mnt/nilfs
> 8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
> 9. start a screen and launch strace -f -o find-cat.log -t find
> /mnt/nilfs -type f -exec cat {} > /dev/null \;
> 10. start a screen and launch strace -f -o apt-get.log -t apt-get update
> 11. launch the last command again as it did not crash the first time
> 12. apt-get crashes
> 13. ps aux > ps-aux-crashed.log
> 13. sysrq+W
> 14. sysrq+E wait for everything to terminate
> 15. sysrq+SUSB
>
> Simplified way of the issue reproducing is starting kernel compilation task
> and "apt-get update" in parallel.
>
> REPRODUCIBILITY:
> The issue is reproduced not stable [60% - 80%]. It is very important to have
> proper environment for the issue reproducing. The critical conditions for
> successful reproducing:
> (1) It should have big modified file by mmap() way.
> (2) This file should have the count of dirty blocks are greater that several
> segments in size (for example, two or three) from time to time during
> processing.
> (3) It should be intensive background activity of files modification in
> another thread.
>
> INVESTIGATION:
> First of all, it is possible to see that the reason of crash is not valid
> page address:
>
> [291.101244] NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0,
> bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
> [291.101249] NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783
>
> Moreover, value of b_page (0x1a82) is 6786. This value looks like segment
> number. And b_blocknr with b_size values look like block numbers. So,
> buffer_head's pointer points on not proper address value.
>
> Detailed investigation of the issue is discovered such picture:
>
> [-----------------------------SEGMENT 6783-------------------------------]
> [290.886448] NILFS [nilfs_segctor_do_construct]:2310
> nilfs_segctor_begin_construction
> [290.886460] NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
> [290.886578] NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
> [290.886601] NILFS [nilfs_segctor_do_construct]:2367
> nilfs_segctor_update_segusage
> [290.886608] NILFS [nilfs_segctor_do_construct]:2371
> nilfs_segctor_prepare_write
> [290.886631] NILFS [nilfs_segctor_do_construct]:2376
> nilfs_add_checksums_on_logs
> [290.886663] NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
> [290.886697] NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024,
> segbuf->sb_segnum 6783
>
> [-----------------------------SEGMENT 6784-------------------------------]
> [290.886755] NILFS [nilfs_segctor_do_construct]:2310
> nilfs_segctor_begin_construction
> [290.886765] NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
> [290.952406] NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1,
> bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
> [290.952411] NILFS [nilfs_lookup_dirty_data_buffers]:783
> bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev
> ffff880221cffee8
> [290.919399] NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
> [290.932458] NILFS [nilfs_segctor_do_construct]:2367
> nilfs_segctor_update_segusage
> [290.933098] NILFS [nilfs_segctor_do_construct]:2371
> nilfs_segctor_prepare_write
> [290.939184] NILFS [nilfs_segctor_do_construct]:2376
> nilfs_add_checksums_on_logs
> [290.951133] NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
> [290.951314] NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page
> ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
> [290.951321] NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
> [290.951325] NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next
> ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
> [290.951534] NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080,
> segbuf->sb_segnum 6784, segbuf->sb_nbio 0
> [----------] ditto
> [290.952308] NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416,
> segbuf->sb_segnum 6784, segbuf->sb_nbio 15
>
> [-----------------------------SEGMENT 6785-------------------------------]
> [290.952340] NILFS [nilfs_segctor_do_construct]:2310
> nilfs_segctor_begin_construction
> [290.952364] NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
> [290.952406] NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2,
> bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
> [290.952411] NILFS [nilfs_lookup_dirty_data_buffers]:783
> bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev
> ffff880221cffc88
> [290.990248] NILFS [nilfs_segctor_do_construct]:2367
> nilfs_segctor_update_segusage
> [290.990265] NILFS [nilfs_segctor_do_construct]:2371
> nilfs_segctor_prepare_write
> [291.008348] NILFS [nilfs_segctor_do_construct]:2376
> nilfs_add_checksums_on_logs
> [291.018110] NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
> [291.021206] NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page
> ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
> [291.021210] NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
> [291.021214] NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next
> ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
> [291.021241] NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440,
> segbuf->sb_segnum 6785, segbuf->sb_nbio 0
> [----------] ditto
> [291.021916] NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728,
> segbuf->sb_segnum 6785, segbuf->sb_nbio 12
>
> [291.021944] NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
> [291.021950] NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
> [291.021964] NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
> [291.021984] NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785
>
> [291.071861] NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0,
> bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
>
> BUG: unable to handle kernel paging request at 0000000000001a82
> IP: [<ffffffffa024d0f2>] nilfs_end_page_io+0x12/0xd0 [nilfs2]
>
> Usually, for every segment we collect dirty files in list. Then, dirty blocks
> are gathered for every dirty file, prepared for write and submitted by means
> of
> nilfs_segbuf_submit_bh() call. Finally, it takes place complete write phase
> after calling nilfs_end_bio_write() on the block layer. Buffers/pages
> are marked as not dirty on final phase and processed files removed from the
> list of dirty files.
>
> It is possible to see that we had three prepare_write and submit_bio phases
> before segbuf_wait and complete_write phase. Moreover, segments compete
> between each other for dirty blocks because on every iteration of segments
> processing dirty buffer_heads are added in several lists of payload_buffers:
>
> [SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8,
> bh->b_assoc_buffers.prev ffff880218bcdf50
> [SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8,
> bh->b_assoc_buffers.prev ffff880222cc7ee8
>
> The next pointer is the same but prev pointer has changed. It means that
> buffer_head has next pointer from one list but prev pointer from another.
> Such modification can be made several times. And, finally,
> it can be resulted in various issues: (1) segctor hanging, (2) segctor
> crashing, (3) file system metadata corruption.
>
> FIX:
> This patch adds:
> (1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
> for every proccessed dirty block;
> (2) checking of BH_Async_Write flag in nilfs_lookup_dirty_data_buffers() and
> nilfs_lookup_dirty_node_buffers();
> (3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
> nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().
>
> Reported-by: Jerome Poulin <[email protected]>
> Reported-by: Anton Eliasson <[email protected]>
> CC: Paul Fertser <[email protected]>
> CC: ARAI Shun-ichi <[email protected]>
> CC: Piotr Szymaniak <[email protected]>
> CC: Juan Barry Manuel Canham <[email protected]>
> CC: Zahid Chowdhury <[email protected]>
> CC: Elmer Zhang <[email protected]>
> CC: Kenneth Langga <[email protected]>
> Signed-off-by: Vyacheslav Dubeyko <[email protected]>
> CC: Ryusuke Konishi <[email protected]>
> ---
> fs/nilfs2/page.c | 2 ++
> fs/nilfs2/segment.c | 11 +++++++++--
> 2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
> index 0ba6798..da27664 100644
> --- a/fs/nilfs2/page.c
> +++ b/fs/nilfs2/page.c
> @@ -94,6 +94,7 @@ void nilfs_forget_buffer(struct buffer_head *bh)
> clear_buffer_nilfs_volatile(bh);
> clear_buffer_nilfs_checked(bh);
> clear_buffer_nilfs_redirected(bh);
> + clear_buffer_async_write(bh);
> clear_buffer_dirty(bh);
> if (nilfs_page_buffers_clean(page))
> __nilfs_clear_page_dirty(page);
> @@ -429,6 +430,7 @@ void nilfs_clear_dirty_page(struct page *page, bool
> silent)
> "discard block %llu, size %zu",
> (u64)bh->b_blocknr, bh->b_size);
> }
> + clear_buffer_async_write(bh);
> clear_buffer_dirty(bh);
> clear_buffer_nilfs_volatile(bh);
> clear_buffer_nilfs_checked(bh);
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index bd88a74..9f6b486 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -665,7 +665,7 @@ static size_t nilfs_lookup_dirty_data_buffers(struct
> inode *inode,
>
> bh = head = page_buffers(page);
> do {
> - if (!buffer_dirty(bh))
> + if (!buffer_dirty(bh) || buffer_async_write(bh))
> continue;
> get_bh(bh);
> list_add_tail(&bh->b_assoc_buffers, listp);
> @@ -699,7 +699,8 @@ static void nilfs_lookup_dirty_node_buffers(struct inode
> *inode,
> for (i = 0; i < pagevec_count(&pvec); i++) {
> bh = head = page_buffers(pvec.pages[i]);
> do {
> - if (buffer_dirty(bh)) {
> + if (buffer_dirty(bh) &&
> + !buffer_async_write(bh)) {
> get_bh(bh);
> list_add_tail(&bh->b_assoc_buffers,
> listp);
> @@ -1579,6 +1580,7 @@ static void nilfs_segctor_prepare_write(struct
> nilfs_sc_info *sci)
>
> list_for_each_entry(bh, &segbuf->sb_segsum_buffers,
> b_assoc_buffers) {
> + set_buffer_async_write(bh);
> if (bh->b_page != bd_page) {
> if (bd_page) {
> lock_page(bd_page);
> @@ -1592,6 +1594,7 @@ static void nilfs_segctor_prepare_write(struct
> nilfs_sc_info *sci)
>
> list_for_each_entry(bh, &segbuf->sb_payload_buffers,
> b_assoc_buffers) {
> + set_buffer_async_write(bh);
> if (bh == segbuf->sb_super_root) {
> if (bh->b_page != bd_page) {
> lock_page(bd_page);
> @@ -1677,6 +1680,7 @@ static void nilfs_abort_logs(struct list_head *logs,
> int err)
> list_for_each_entry(segbuf, logs, sb_list) {
> list_for_each_entry(bh, &segbuf->sb_segsum_buffers,
> b_assoc_buffers) {
> + clear_buffer_async_write(bh);
> if (bh->b_page != bd_page) {
> if (bd_page)
> end_page_writeback(bd_page);
> @@ -1686,6 +1690,7 @@ static void nilfs_abort_logs(struct list_head *logs,
> int err)
>
> list_for_each_entry(bh, &segbuf->sb_payload_buffers,
> b_assoc_buffers) {
> + clear_buffer_async_write(bh);
> if (bh == segbuf->sb_super_root) {
> if (bh->b_page != bd_page) {
> end_page_writeback(bd_page);
> @@ -1755,6 +1760,7 @@ static void nilfs_segctor_complete_write(struct
> nilfs_sc_info *sci)
> b_assoc_buffers) {
> set_buffer_uptodate(bh);
> clear_buffer_dirty(bh);
> + clear_buffer_async_write(bh);
> if (bh->b_page != bd_page) {
> if (bd_page)
> end_page_writeback(bd_page);
> @@ -1776,6 +1782,7 @@ static void nilfs_segctor_complete_write(struct
> nilfs_sc_info *sci)
> b_assoc_buffers) {
> set_buffer_uptodate(bh);
> clear_buffer_dirty(bh);
> + clear_buffer_async_write(bh);
> clear_buffer_delay(bh);
> clear_buffer_nilfs_volatile(bh);
> clear_buffer_nilfs_redirected(bh);
> --
> 1.7.9.5
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html