Hi Sebastian,

Thank you for the report.

On Fri, Feb 20, 2015 at 09:10:35PM +0100, Sebastian Andrzej Siewior wrote:
> Hi,
> 
> I've been playing with f2fs on eMMC with v3.19. I simply cp/rm files on
> the partition for a while. The partition has a size of 510MiB and runs
> out of space now and then.
> I managed to run into this:
> 
> |[  428.035231] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! 
> [kworker/u2:2:59]
> |[  428.042896] Modules linked in:
> |[  428.045972] CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G        W      
> 3.19.0-00010-g10c11c51ffed #153
> |[  428.062166] Workqueue: writeback bdi_writeback_workfn (flush-179:0)
> |[  428.068460] task: df230000 ti: df23e000 task.ti: df23e000
> |[  428.073866] PC is at __submit_merged_bio+0x6c/0x110
> |[  428.078746] LR is at f2fs_submit_merged_bio+0x74/0x80
> |[  428.126618] CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G        W      
> 3.19.0-00010-g10c11c51ffed #153
> |[  428.142802] Workqueue: writeback bdi_writeback_workfn (flush-179:0)
> …
> |[  428.256579] [<c00085c4>] (gic_handle_irq) from [<c0012e84>] 
> (__irq_svc+0x44/0x5c)
> |[  428.264063] Exception stack(0xdf23fb48 to 0xdf23fb90)
> |[  428.269119] fb40:                   deef3484 ffff0001 ffff0001 00000027 
> deef3484 00000000
> |[  428.277299] fb60: deef3440 00000000 de426000 deef34ec deefc440 df23fbb4 
> df23fbb8 df23fb90
> |[  428.285477] fb80: c02191f0 c0218fa0 60000013 ffffffff
> |[  428.290539] [<c0012e84>] (__irq_svc) from [<c0218fa0>] 
> (__submit_merged_bio+0x6c/0x110)
> |[  428.298552] [<c0218fa0>] (__submit_merged_bio) from [<c02191f0>] 
> (f2fs_submit_merged_bio+0x74/0x80)
> |[  428.307604] [<c02191f0>] (f2fs_submit_merged_bio) from [<c021624c>] 
> (sync_dirty_dir_inodes+0x70/0x78)
> |[  428.316831] [<c021624c>] (sync_dirty_dir_inodes) from [<c0216358>] 
> (write_checkpoint+0x104/0xc10)
> |[  428.325709] [<c0216358>] (write_checkpoint) from [<c021231c>] 
> (f2fs_sync_fs+0x80/0xbc)
> |[  428.333637] [<c021231c>] (f2fs_sync_fs) from [<c0221eb8>] 
> (f2fs_balance_fs_bg+0x4c/0x68)
> |[  428.341736] [<c0221eb8>] (f2fs_balance_fs_bg) from [<c021e9b8>] 
> (f2fs_write_node_pages+0x40/0x110)
> |[  428.350704] [<c021e9b8>] (f2fs_write_node_pages) from [<c00de620>] 
> (do_writepages+0x34/0x48)
> |[  428.359152] [<c00de620>] (do_writepages) from [<c0145714>] 
> (__writeback_single_inode+0x50/0x228)
> |[  428.367946] [<c0145714>] (__writeback_single_inode) from [<c0146184>] 
> (writeback_sb_inodes+0x1a8/0x378)
> |[  428.377346] [<c0146184>] (writeback_sb_inodes) from [<c01463e4>] 
> (__writeback_inodes_wb+0x90/0xc8)
> |[  428.386312] [<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] 
> (wb_writeback+0x1dc/0x28c)
> |[  428.394843] [<c01465f8>] (wb_writeback) from [<c0146dd8>] 
> (bdi_writeback_workfn+0x2ac/0x460)
> |[  428.403293] [<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] 
> (process_one_work+0x11c/0x3a4)
> |[  428.412087] [<c003c3fc>] (process_one_work) from [<c003c844>] 
> (worker_thread+0x17c/0x490)
> |[  428.420272] [<c003c844>] (worker_thread) from [<c0041398>] 
> (kthread+0xec/0x100)
> |[  428.427590] [<c0041398>] (kthread) from [<c000ed10>] 
> (ret_from_fork+0x14/0x24)
> 
> The kernel config is UP no preemption/server. Once in this state, it
> does not recover. I have more backtraces and they all look more or less
> the same. I pulled in the f2fs changes from this merge window and was able
> to reproduce the same thing. It can take hours to get it reproduced.
> 
> The backtrace up to sync_dirty_dir_inodes() is same once this "stall"
> occures. I meassured the time spent within this function and added
> printk() once 5 seconds were reached and I saw output.
> At the point of the stall, the task state of this work queue:
> 
> |kworker/u2:1    R running      0    53      2 0x00000002
> |Workqueue: writeback bdi_writeback_workfn (flush-179:0)
> |[<c0012e84>] (__irq_svc) from [<c0219dd8>] (f2fs_submit_merged_bio+0x4c/0x84)
> |[<c0219dd8>] (f2fs_submit_merged_bio) from [<c021632c>] 
> (sync_dirty_dir_inodes+0x70/0x78)
> |[<c021632c>] (sync_dirty_dir_inodes) from [<c0216448>] 
> (write_checkpoint+0x114/0xbf0)
> |[<c0216448>] (write_checkpoint) from [<c0212314>] (f2fs_sync_fs+0x7c/0xcc)
> |[<c0212314>] (f2fs_sync_fs) from [<c0221f14>] (f2fs_balance_fs_bg+0x4c/0x68)
> |[<c0221f14>] (f2fs_balance_fs_bg) from [<c021ed28>] 
> (f2fs_write_node_pages+0x40/0x110)
> |[<c021ed28>] (f2fs_write_node_pages) from [<c00de620>] 
> (do_writepages+0x34/0x48)
> |[<c00de620>] (do_writepages) from [<c0145714>] 
> (__writeback_single_inode+0x50/0x228)
> |[<c0145714>] (__writeback_single_inode) from [<c0146184>] 
> (writeback_sb_inodes+0x1a8/0x378)
> |[<c0146184>] (writeback_sb_inodes) from [<c01463e4>] 
> (__writeback_inodes_wb+0x90/0xc8)
> |[<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] 
> (wb_writeback+0x1dc/0x28c)
> |[<c01465f8>] (wb_writeback) from [<c0146dd8>] 
> (bdi_writeback_workfn+0x2ac/0x460)
> |[<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] 
> (process_one_work+0x11c/0x3a4)
> |[<c003c3fc>] (process_one_work) from [<c003c844>] (worker_thread+0x17c/0x490)
> |[<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100)
> |[<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24)
> 
> The 0x0…2 mean TIF_NEED_RESCHED is set but since preemption is off this
> schedule() seem not to happen.
> 
> I have no idea what sync_dirty_dir_inodes() is doing :) Can you image a
> case where it could busy loop? I could come up with this:
> 
> | void sync_dirty_dir_inodes(struct f2fs_sb_info *sbi)
> | {
> | retry:
> …
> |         head = &sbi->dir_inode_list;
> …
> |         entry = list_entry(head->next, struct inode_entry, list);
> |         inode = igrab(entry->inode);
> |         spin_unlock(&sbi->dir_inode_lock);
> …
> |         } else {
> |                 /*
> |                  * We should submit bio, since it exists several
> |                  * wribacking dentry pages in the freeing inode.
> |                  */
> |                 f2fs_submit_merged_bio(sbi, DATA, WRITE);
> ->
> |void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi,
> |                                 enum page_type type, int rw)
> | {
> |         io = is_read_io(rw) ? &sbi->read_io : &sbi->write_io[btype];
> |  
> |         down_write(&io->io_rwsem);
> …
> | 
> |         __submit_merged_bio(io);
> ->
> | static void __submit_merged_bio(struct f2fs_bio_info *io)
> | {
> |         if (!io->bio)
> exit here since it is set to NULL at the end of this function.
> |                 return;
> …
> |         submit_bio(fio->rw, io->bio);
> |         io->bio = NULL;
> | }               
> <-
> |         up_write(&io->io_rwsem);
> | }
> |
> <-
> |         }
> |         goto retry;
> | }
> 
> I don't know what submit_bio() does (in terms of terminating the loop
> here) but I haven't seen it in my backtraces (and assume it does not get
> invoked).
> Ah. The other process that (I think) could run:
> | rm              R running      0  1989   1774 0x00000000
> | [<c047c55c>] (__schedule) from [<c00486dc>] (__cond_resched+0x30/0x4c)
> | [<c00486dc>] (__cond_resched) from [<c047c8c8>] (_cond_resched+0x4c/0x54)
> | [<c047c8c8>] (_cond_resched) from [<c00e1aec>] 
> (truncate_inode_pages_range+0x1f0/0x5e8)
> | [<c00e1aec>] (truncate_inode_pages_range) from [<c00e1fd8>] 
> (truncate_inode_pages+0x28/0x30)
> | [<c00e1fd8>] (truncate_inode_pages) from [<c00e2148>] 
> (truncate_inode_pages_final+0x60/0x64)
> | [<c00e2148>] (truncate_inode_pages_final) from [<c020c92c>] 
> (f2fs_evict_inode+0x4c/0x268)
> | [<c020c92c>] (f2fs_evict_inode) from [<c0137214>] (evict+0x94/0x140)
> | [<c0137214>] (evict) from [<c01377e8>] (iput+0xc8/0x134)
> | [<c01377e8>] (iput) from [<c01333e4>] (d_delete+0x154/0x180)
> | [<c01333e4>] (d_delete) from [<c0129870>] (vfs_rmdir+0x114/0x12c)
> | [<c0129870>] (vfs_rmdir) from [<c012d644>] (do_rmdir+0x158/0x168)
> | [<c012d644>] (do_rmdir) from [<c012dd90>] (SyS_unlinkat+0x30/0x3c)
> | [<c012dd90>] (SyS_unlinkat) from [<c000ec40>] (ret_fast_syscall+0x0/0x4c)

This inode is the directory (c.f., do_rmdir) causing a infinite loop on
sync_dirty_dir_inodes.
The sync_dirty_dir_inodes tries to flush dirty dentry pages, but if the inode is
under eviction, it submits bios and do it again until eviction is finished.

It needs to check why truncate_inode_pages_range cannot be completed.

How about adding cond_resched() right after f2fs_submit_merged_bio in
sync_dirty_dir_inodes?

Could you test this?

Thanks,

> 
> I wasn't able reproduce the issue after enable preemption (low latency
> desktop). I saw however the the code looping in sync_dirty_dir_inodes()
> for 5 seconds but it did not lockup the machine. The box continued to
> work and the file system access was working.
> 
> Sebastian

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to