Hi Yunlei, > -----Original Message----- > From: He YunLei [mailto:heyun...@huawei.com] > Sent: Tuesday, February 23, 2016 7:36 PM > To: Chao Yu; jaeg...@kernel.org; linux-f2fs-devel@lists.sourceforge.net > Cc: bintian.w...@huawei.com; 'Biao He' > Subject: Re: [f2fs-dev] [PATCH] f2fs: avoid hungtask problem caused by losing > wake_up > > On 2016/2/23 17:15, Chao Yu wrote: > Hi Chao, > > > Hi Yunlei, > > > >> -----Original Message----- > >> From: He YunLei [mailto:heyun...@huawei.com] > >> Sent: Tuesday, February 23, 2016 3:03 PM > >> To: Chao Yu; jaeg...@kernel.org; linux-f2fs-devel@lists.sourceforge.net > >> Cc: bintian.w...@huawei.com; 'Biao He' > >> Subject: Re: [f2fs-dev] [PATCH] f2fs: avoid hungtask problem caused by > >> losing wake_up > >> > >> On 2016/2/23 13:44, Chao Yu wrote: > >>> Hi Yunlei, > >> Hi Chao, > >>> > >>>> -----Original Message----- > >>>> From: Yunlei He [mailto:heyun...@huawei.com] > >>>> Sent: Tuesday, February 23, 2016 12:08 PM > >>>> To: chao2...@samsung.com; jaeg...@kernel.org; > >>>> linux-f2fs-devel@lists.sourceforge.net > >>>> Cc: bintian.w...@huawei.com; Yunlei He; Biao He > >>>> Subject: [f2fs-dev] [PATCH] f2fs: avoid hungtask problem caused by > >>>> losing wake_up > >>>> > >>>> The D state of wait_on_all_pages_writeback should be waken by > >>>> function f2fs_write_end_io when all writeback pages have been > >>>> succesfully written to device. It's possible that wake_up comes > >>>> between get_pages and io_schedule. Maybe in this case it will > >>>> lost wake_up and still in D state even if all pages have been > >>>> write back to device, and finally, the whole system will be into > >>>> the hungtask state. > >>> > >>> I haven't encountered such issue so far, do you suffer this in real > >>> world? > >>> > >> yes, I have encounter it, the whole file system is blocked at function > >> wait_on_all_pages_writeback beyond 120s when write cp, and no error > >> reported > >> by storage device driver. > > > > Could this reproducible? If it could, could you please share the details. > > And did this occur in a huge size f2fs image? > > > >>>> > >>>> if (!get_pages(sbi, F2FS_WRITEBACK)) > >>>> break; > >>>> <--------- wake_up > >>> > >>> wake_up will put all tasks linked in sbi->cp_wait on run-queue, so > >>> here it should be save to call io_schedule, after being rescheduled, > >>> it will get the chance to check above condition to break out. > >>> > >>> Thanks, > >> > >> Here, we just doubt something weird may cause wait_on_all_pages_writeback > >> could not be waken. Wake_up trigger only one time by last bio's end_io > >> function, if the thread happen to miss it, the thread will be in D state > >> forever. So we change the code to make wait_on_all_pages_writeback awaken > >> periodically, then check the condition. > > > > Got it. > > > > The patch can fix issue that checkpointer will wait forever in case of > > write_end_io was failed to call wake_up for some reason.
I found one possible case: CPU0: CPU1: - write_checkpoint - do_checkpoint - wait_on_all_pages_writeback - f2fs_write_end_io - wake_up this is last writebacked page, but no sleeper in sbi->cp_wait wait queue, wake_up is not been called. - prepare_to_wait(TASK_UNINTERRUPTIBLE) Here, current task is been preempted, but there will be no waker to wake up this task since last write_end_io has been called before. So current task will sleep forever. - io_schedule How do you think of it? And if this is right, following patch can fix this issue. --- fs/f2fs/checkpoint.c | 14 +++++++++----- fs/f2fs/data.c | 9 +++++++-- fs/f2fs/f2fs.h | 3 ++- fs/f2fs/super.c | 1 + 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 9d277f8..9446c3d 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -914,15 +914,19 @@ static void wait_on_all_pages_writeback(struct f2fs_sb_info *sbi) { DEFINE_WAIT(wait); - for (;;) { - prepare_to_wait(&sbi->cp_wait, &wait, TASK_UNINTERRUPTIBLE); + spin_lock(&sbi->cp_wb_lock); - if (!get_pages(sbi, F2FS_WRITEBACK)) - break; + while (get_pages(sbi, F2FS_WRITEBACK)) { + prepare_to_wait(&sbi->cp_wait, &wait, TASK_UNINTERRUPTIBLE); + spin_unlock(&sbi->cp_wb_lock); io_schedule(); + spin_lock(&sbi->cp_wb_lock); + + finish_wait(&sbi->cp_wait, &wait); } - finish_wait(&sbi->cp_wait, &wait); + + spin_unlock(&sbi->cp_wb_lock); } static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index e5c762b..e31deb97 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -59,6 +59,7 @@ static void f2fs_write_end_io(struct bio *bio) { struct f2fs_sb_info *sbi = bio->bi_private; struct bio_vec *bvec; + unsigned long flags; int i; bio_for_each_segment_all(bvec, bio, i) { @@ -74,8 +75,12 @@ static void f2fs_write_end_io(struct bio *bio) dec_page_count(sbi, F2FS_WRITEBACK); } - if (!get_pages(sbi, F2FS_WRITEBACK) && wq_has_sleeper(&sbi->cp_wait)) - wake_up(&sbi->cp_wait); + if (!get_pages(sbi, F2FS_WRITEBACK)) { + spin_lock_irqsave(&sbi->cp_wb_lock, flags); + if (wq_has_sleeper(&sbi->cp_wait)) + wake_up(&sbi->cp_wait); + spin_unlock_irqrestore(&sbi->cp_wb_lock, flags); + } bio_put(bio); } diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 0d25430..fd47984 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -727,7 +727,8 @@ struct f2fs_sb_info { struct rw_semaphore cp_rwsem; /* blocking FS operations */ struct rw_semaphore node_write; /* locking node writes */ struct mutex writepages; /* mutex for writepages() */ - wait_queue_head_t cp_wait; + wait_queue_head_t cp_wait; /* for wait pages writeback */ + spinlock_t cp_wb_lock; /* for protect cp_wait */ unsigned long last_time[MAX_TIME]; /* to store time in jiffies */ long interval_time[MAX_TIME]; /* to store thresholds */ diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 7b62016..5316c7a 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1374,6 +1374,7 @@ try_onemore: init_rwsem(&sbi->cp_rwsem); init_waitqueue_head(&sbi->cp_wait); + spin_lock_init(&sbi->cp_wb_lock); init_sb_info(sbi); /* get an inode for meta space */ -- 2.7.0 > > > > But I doubt more that the reason we are stuck is there are remained pages > > cached in bio buffer without being submitted. To make sure, maybe in > > wait_on_all_pages_writeback we could add print info to see whether > > sbi->write_io[].bio is valid or not. > > > We use tool dump f2fs_sb_info information and find that: > > write_io[DATA].bio = 0; > write_io[NODE].bio = 0; > write_io[META].bio = 0; > > nr_pages[F2FS_WRITEBACK] = 0; > nr_pages[F2FS_DIRTY_DENTS] = 0; > nr_pages[F2FS_DIRTY_NODES] = 13; Weird, dirty nodes count should be 0. Thanks > nr_pages[F2FS_DIRTY_META] = 0; > nr_pages[F2FS_INMEM_PAGES] = 0; > > So we believe that the block device is ok! > > Thanks, > > > Thanks, > > > >> > >>> > >>>> io_schedule(); > >>>> > >>>> Signed-off-by: Yunlei He <heyun...@huawei.com> > >>>> Signed-off-by: Biao He <hebi...@huawei.com> > >>>> --- > >>>> fs/f2fs/checkpoint.c | 2 +- > >>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>> > >>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > >>>> index 2bac8a1..f55355d 100644 > >>>> --- a/fs/f2fs/checkpoint.c > >>>> +++ b/fs/f2fs/checkpoint.c > >>>> @@ -920,7 +920,7 @@ static void wait_on_all_pages_writeback(struct > >>>> f2fs_sb_info *sbi) > >>>> if (!get_pages(sbi, F2FS_WRITEBACK)) > >>>> break; > >>>> > >>>> - io_schedule(); > >>>> + io_schedule_timeout(5*HZ); > >>>> } > >>>> finish_wait(&sbi->cp_wait, &wait); > >>>> } > >>>> -- > >>>> 1.9.1 > >>> > >>> > >>> > >>> . > >>> > > > > > > > > . > > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel