Hi chao:

As I mentioned before,
https://lore.kernel.org/linux-f2fs-devel/kl1pr0601mb400309c5d62bfddde6aad8aebb...@kl1pr0601mb4003.apcprd06.prod.outlook.com/T/#mbe9a8f27626ac7ca71035e25f5502e756ab877ac
there is a potential dead lock problem when just remove
compress file condition in __should_serialize_io().
The modify like this:

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9f754aaef558..ffbee94924f3 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c

@@ -3218,8 +3224,6 @@ static inline bool __should_serialize_io(struct inode
*inode,
        if (IS_NOQUOTA(inode))
                return false;

-       if (f2fs_need_compress_data(inode))
-               return true;
        if (wbc->sync_mode != WB_SYNC_ALL)
                return true;

This modification is a prerequisite, if there is no modification there is no
problem for now.

And I make a debug, and here is what I found:
Thread 1 is doing checkpoint in below procedure:

Thread 2                                                Thread 3
f2fs_write_cache_pages page 0
        ->lock_page(page) page 0-3                      f2fs_write_cache_pages 
page 0
        ->f2fs_write_compressed_pages                           
->lock_page(page) //lock page 0, sleep
                ->f2fs_trylock_op failed
                ->f2fs_write_raw_pages
                        ->f2fs_write_single_data_page
                                ->f2fs_trylock_op failed
                        ->unlock(page)  page 0
                        ->cond_resched();                       
->lock_page(page)  //lock page 0 success
                                                                
->lock_page(page) //try page 1,

                        ->lock_page(page); page 0 //never success

When Thread 1 do checkpoint and down_write of cp_rwsem, Thread 2 and Thread 3
are write same cluster, Thread 2 start write cache page first, and get lock
page 0-3, beacuse of f2fs_trylock_op failed, f2fs_write_compressed_pages and
f2fs_write_single_data_page will failed, and Thread 2 will unlock page 0, but
keep lock page 1-3 and schedule out, if thread 3 start write cache page in
this time, thread 3 could get lock of page 0, but when try lock of page 1, it
would never success.  Then a deadlock occured between thread 2 and Thread 3.

So, there is a potential limit: We can't write same clutser is multithread for
compressed file .

If we need to fix this problem, my thoughts is we shoudn't unlcok page in
f2fs_write_single_data_page for compress raw page.

Do we need to fix this for now?)


_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to