I think one way to solve this issue is adding an extra mount option,
e.g. compress_mempool, once user specify this option, we can allocate
and reserve memory for one single workspace of compression algorithm,
during compress/decompress, once we can not grab any more memory from
system, we can let the process wait on that private mempool, it may
increase latency of write compressed data, however it decrease failure
ratio on low-end device.

On 2020/9/1 11:07, 5kft wrote:
Thanks for the patch - I applied it against 5.9-rc2, and it seems to help...:  
The test I am using for this is to copy the entire rootfs tree to a 
zstd-compressed f2fs partition.  Previously, even a vm.min_free_kbytes of 32768 
wasn't enough to avoid the allocation traps for the copy; with this patch I'm 
able to complete the entire copy without an error at vm.min_free_kbytes=32768.

However, if I try vm.min_free_kbytes=16384 (for example), then it still runs 
out of memory and logs many traps.  It still seems rather excessive to require 
so much available memory...?

Example traps at the system default vm.min_free_kbytes of ~2800 (following 
board boot):

[  141.863780] kworker/u8:4: page allocation failure: order:6, 
mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[  141.863810] CPU: 3 PID: 1444 Comm: kworker/u8:4 Tainted: G         C        
5.9.0-rc2-sunxi #trunk
[  141.863812] Hardware name: Allwinner sun8i Family
[  141.863833] Workqueue: writeback wb_workfn (flush-179:0)
[  141.863859] [<c010d415>] (unwind_backtrace) from [<c01097a5>] 
(show_stack+0x11/0x14)
[  141.863872] [<c01097a5>] (show_stack) from [<c0573da1>] 
(dump_stack+0x75/0x84)
[  141.863888] [<c0573da1>] (dump_stack) from [<c0246163>] 
(warn_alloc+0xa3/0x104)
[  141.863899] [<c0246163>] (warn_alloc) from [<c0246d71>] 
(__alloc_pages_nodemask+0xbad/0xc58)
[  141.863911] [<c0246d71>] (__alloc_pages_nodemask) from [<c022a09f>] 
(kmalloc_order+0x23/0x50)
[  141.863920] [<c022a09f>] (kmalloc_order) from [<c022a0e5>] 
(kmalloc_order_trace+0x19/0x90)
[  141.863933] [<c022a0e5>] (kmalloc_order_trace) from [<c0481519>] 
(zstd_init_compress_ctx+0x51/0xfc)
[  141.863946] [<c0481519>] (zstd_init_compress_ctx) from [<c048304b>] 
(f2fs_write_multi_pages+0x27b/0x6a0)
[  141.863961] [<c048304b>] (f2fs_write_multi_pages) from [<c04699e3>] 
(f2fs_write_cache_pages+0x3bf/0x538)
[  141.863971] [<c04699e3>] (f2fs_write_cache_pages) from [<c0469d8f>] 
(f2fs_write_data_pages+0x233/0x264)
[  141.863985] [<c0469d8f>] (f2fs_write_data_pages) from [<c02139b9>] 
(do_writepages+0x35/0x98)
[  141.863995] [<c02139b9>] (do_writepages) from [<c02947ef>] 
(__writeback_single_inode+0x2f/0x358)
[  141.864004] [<c02947ef>] (__writeback_single_inode) from [<c0294c9d>] 
(writeback_sb_inodes+0x185/0x378)
[  141.864012] [<c0294c9d>] (writeback_sb_inodes) from [<c0294ec1>] 
(__writeback_inodes_wb+0x31/0x88)
[  141.864019] [<c0294ec1>] (__writeback_inodes_wb) from [<c029510b>] 
(wb_writeback+0x1f3/0x264)
[  141.864026] [<c029510b>] (wb_writeback) from [<c0296053>] 
(wb_workfn+0x2a3/0x3a4)
[  141.864035] [<c0296053>] (wb_workfn) from [<c0130313>] 
(process_one_work+0x15f/0x3b0)
[  141.864043] [<c0130313>] (process_one_work) from [<c013065f>] 
(worker_thread+0xfb/0x3e0)
[  141.864053] [<c013065f>] (worker_thread) from [<c0135407>] 
(kthread+0xeb/0x10c)
[  141.864063] [<c0135407>] (kthread) from [<c0100159>] 
(ret_from_fork+0x11/0x38)
[  141.864067] Exception stack(0xcf153fb0 to 0xcf153ff8)
[  141.864073] 3fa0:                                     00000000 00000000 
00000000 00000000
[  141.864079] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000
[  141.864084] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  141.864089] Mem-Info:
[  141.864103] active_anon:105 inactive_anon:9374 isolated_anon:0
                 active_file:12581 inactive_file:77234 isolated_file:32
                 unevictable:4 dirty:11187 writeback:174
                 slab_reclaimable:3566 slab_unreclaimable:6038
                 mapped:5698 shmem:414 pagetables:348 bounce:0
                 free:10114 free_pcp:223 free_cma:8329
[  141.864114] Node 0 active_anon:420kB inactive_anon:37496kB 
active_file:50324kB inactive_file:308936kB unevictable:16kB isolated(anon):0kB 
isolated(file):128kB mapped:22792kB dirty:44748kB writeback:696kB shmem:1656kB 
writeback_tmp:0kB kernel_stack:1216kB all_unreclaimable? no
[  141.864127] Normal free:40456kB min:6904kB low:7604kB high:8304kB 
reserved_highatomic:0KB active_anon:420kB inactive_anon:37496kB 
active_file:50248kB inactive_file:308768kB unevictable:16kB 
writepending:45608kB present:524288kB managed:503884kB mlocked:16kB 
pagetables:1392kB bounce:0kB free_pcp:892kB local_pcp:176kB free_cma:33316kB
[  141.864129] lowmem_reserve[]: 0 0 0
[  141.864135] Normal: 88*4kB (UMEC) 107*8kB (UMEC) 51*16kB (UMEC) 29*32kB 
(UMEC) 13*64kB (UMEC) 2*128kB (UE) 3*256kB (UC) 2*512kB (U) 2*1024kB (U) 
0*2048kB 8*4096kB (C) = 40648kB
[  141.864162] 90296 total pagecache pages
[  141.864168] 0 pages in swap cache
[  141.864171] Swap cache stats: add 0, delete 0, find 0/0
[  141.864173] Free swap  = 251940kB
[  141.864175] Total swap = 251940kB
[  141.864177] 131072 pages RAM
[  141.864179] 0 pages HighMem/MovableOnly
[  141.864181] 5101 pages reserved
[  141.864184] 32768 pages cma reserved
[  155.171118] warn_alloc: 23 callbacks suppressed
[  155.171143] kworker/u8:4: page allocation failure: order:6, 
mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[  155.171168] CPU: 1 PID: 1444 Comm: kworker/u8:4 Tainted: G         C        
5.9.0-rc2-sunxi #trunk
[  155.171172] Hardware name: Allwinner sun8i Family
[  155.171195] Workqueue: writeback wb_workfn (flush-179:0)
[  155.171229] [<c010d415>] (unwind_backtrace) from [<c01097a5>] 
(show_stack+0x11/0x14)
[  155.171243] [<c01097a5>] (show_stack) from [<c0573da1>] 
(dump_stack+0x75/0x84)
[  155.171266] [<c0573da1>] (dump_stack) from [<c0246163>] 
(warn_alloc+0xa3/0x104)
[  155.171281] [<c0246163>] (warn_alloc) from [<c0246d71>] 
(__alloc_pages_nodemask+0xbad/0xc58)
[  155.171294] [<c0246d71>] (__alloc_pages_nodemask) from [<c022a09f>] 
(kmalloc_order+0x23/0x50)
[  155.171304] [<c022a09f>] (kmalloc_order) from [<c022a0e5>] 
(kmalloc_order_trace+0x19/0x90)
[  155.171320] [<c022a0e5>] (kmalloc_order_trace) from [<c0481519>] 
(zstd_init_compress_ctx+0x51/0xfc)
[  155.171334] [<c0481519>] (zstd_init_compress_ctx) from [<c048304b>] 
(f2fs_write_multi_pages+0x27b/0x6a0)
[  155.171349] [<c048304b>] (f2fs_write_multi_pages) from [<c04699e3>] 
(f2fs_write_cache_pages+0x3bf/0x538)
[  155.171359] [<c04699e3>] (f2fs_write_cache_pages) from [<c0469d8f>] 
(f2fs_write_data_pages+0x233/0x264)
[  155.171374] [<c0469d8f>] (f2fs_write_data_pages) from [<c02139b9>] 
(do_writepages+0x35/0x98)
[  155.171385] [<c02139b9>] (do_writepages) from [<c02947ef>] 
(__writeback_single_inode+0x2f/0x358)
[  155.171394] [<c02947ef>] (__writeback_single_inode) from [<c0294c9d>] 
(writeback_sb_inodes+0x185/0x378)
[  155.171402] [<c0294c9d>] (writeback_sb_inodes) from [<c0294ec1>] 
(__writeback_inodes_wb+0x31/0x88)
[  155.171409] [<c0294ec1>] (__writeback_inodes_wb) from [<c029510b>] 
(wb_writeback+0x1f3/0x264)
[  155.171417] [<c029510b>] (wb_writeback) from [<c0295ffd>] 
(wb_workfn+0x24d/0x3a4)
[  155.171428] [<c0295ffd>] (wb_workfn) from [<c0130313>] 
(process_one_work+0x15f/0x3b0)
[  155.171437] [<c0130313>] (process_one_work) from [<c013065f>] 
(worker_thread+0xfb/0x3e0)
[  155.171447] [<c013065f>] (worker_thread) from [<c0135407>] 
(kthread+0xeb/0x10c)
[  155.171457] [<c0135407>] (kthread) from [<c0100159>] 
(ret_from_fork+0x11/0x38)
[  155.171462] Exception stack(0xcf153fb0 to 0xcf153ff8)
[  155.171468] 3fa0:                                     00000000 00000000 
00000000 00000000
[  155.171474] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000
[  155.171480] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  155.171488] Mem-Info:
[  155.171504] active_anon:105 inactive_anon:9403 isolated_anon:0
                 active_file:17189 inactive_file:52888 isolated_file:0
                 unevictable:4 dirty:11785 writeback:50
                 slab_reclaimable:4217 slab_unreclaimable:6052
                 mapped:5706 shmem:414 pagetables:349 bounce:0
                 free:29132 free_pcp:340 free_cma:27347
[  155.171516] Node 0 active_anon:420kB inactive_anon:37612kB 
active_file:68756kB inactive_file:211552kB unevictable:16kB isolated(anon):0kB 
isolated(file):0kB mapped:22824kB dirty:47140kB writeback:200kB shmem:1656kB 
writeback_tmp:0kB kernel_stack:1216kB all_unreclaimable? no
[  155.171531] Normal free:116528kB min:6904kB low:7604kB high:8304kB 
reserved_highatomic:0KB active_anon:420kB inactive_anon:37612kB 
active_file:68680kB inactive_file:211696kB unevictable:16kB 
writepending:47352kB present:524288kB managed:503884kB mlocked:16kB 
pagetables:1396kB bounce:0kB free_pcp:1356kB local_pcp:8kB free_cma:109388kB
[  155.171534] lowmem_reserve[]: 0 0 0
[  155.171540] Normal: 365*4kB (UMEC) 188*8kB (UMEC) 153*16kB (UMC) 111*32kB 
(UMC) 73*64kB (UMC) 44*128kB (UC) 33*256kB (UC) 18*512kB (UC) 18*1024kB (UC) 
6*2048kB (C) 12*4096kB (C) = 116804kB
[  155.171568] 70535 total pagecache pages
[  155.171576] 0 pages in swap cache
[  155.171579] Swap cache stats: add 0, delete 0, find 0/0
[  155.171581] Free swap  = 251940kB
[  155.171583] Total swap = 251940kB
[  155.171585] 131072 pages RAM
[  155.171587] 0 pages HighMem/MovableOnly
[  155.171590] 5101 pages reserved
[  155.171592] 32768 pages cma reserved


On Mon, Aug 31, 2020, at 6:39 PM, Chao Yu wrote:
Hi,

We should align max compress window size of zstd to cluster size of
current inode,
by default, cluster size is 16KB (log size is 2), so it can reduce size
of allocated
memory significantly.

So, could you please try below patch first?

  From c4bf178e5133525027d817a2ac542db6f5621c4f Mon Sep 17 00:00:00 2001
From: Chao Yu <[email protected]>
Date: Tue, 1 Sep 2020 09:29:08 +0800
Subject: [PATCH] fix memory allocation failure on zstd decompression

Signed-off-by: Chao Yu <[email protected]>
---
   fs/f2fs/compress.c | 7 ++++---
   fs/f2fs/f2fs.h     | 2 +-
   2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index df097c4a71e1..357303d8514b 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -382,16 +382,17 @@ static int zstd_init_decompress_ctx(struct
decompress_io_ctx *dic)
        ZSTD_DStream *stream;
        void *workspace;
        unsigned int workspace_size;
+       unsigned int max_window_size =
+                       MAX_COMPRESS_WINDOW_SIZE(dic->log_cluster_size);

-       workspace_size = ZSTD_DStreamWorkspaceBound(MAX_COMPRESS_WINDOW_SIZE);
+       workspace_size = ZSTD_DStreamWorkspaceBound(max_window_size);

        workspace = f2fs_kvmalloc(F2FS_I_SB(dic->inode),
                                        workspace_size, GFP_NOFS);
        if (!workspace)
                return -ENOMEM;

-       stream = ZSTD_initDStream(MAX_COMPRESS_WINDOW_SIZE,
-                                       workspace, workspace_size);
+       stream = ZSTD_initDStream(max_window_size, workspace, workspace_size);
        if (!stream) {
                printk_ratelimited("%sF2FS-fs (%s): %s ZSTD_initDStream 
failed\n",
                                KERN_ERR, F2FS_I_SB(dic->inode)->sb->s_id,
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 21f86001bb3a..d210809292f9 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1419,7 +1419,7 @@ struct decompress_io_ctx {
   #define NULL_CLUSTER                 ((unsigned int)(~0))
   #define MIN_COMPRESS_LOG_SIZE                2
   #define MAX_COMPRESS_LOG_SIZE                8
-#define MAX_COMPRESS_WINDOW_SIZE       ((PAGE_SIZE) << MAX_COMPRESS_LOG_SIZE)
+#define MAX_COMPRESS_WINDOW_SIZE(log_size)     ((PAGE_SIZE) << (log_size))

   struct f2fs_sb_info {
        struct super_block *sb;                 /* pointer to VFS super block */
--
2.26.2



On 2020/9/1 2:14, 5kft wrote:
Sounds good :-)  Perhaps it's simply that zstd needs a lot of memory to operate, however 
it's unfortunate that it doesn't work on smaller platforms "out of the box" 
like lz4 does.  Should there a be note or guidance of some sort regarding this for 
smaller embedded platforms?

On Mon, Aug 31, 2020, at 11:04 AM, Jaegeuk Kim wrote:
Let me add more f2fs folks. :)

On 08/27, 5kft wrote:
(Note that for testing this I backported f2fs from 5.9-rc2 into 5.8.5, as I 
don't have 5.9 working on these boards yet.)

On Thu, Aug 27, 2020, at 7:39 AM, 5kft wrote:
Quick update - I encounter the problem with f2fs zstd compression in the 
mainline 5.9-rc2 kernel as well - e.g.,

[   67.668529] F2FS-fs (mmcblk0p1): Found nat_bits in checkpoint
[   68.339021] F2FS-fs (mmcblk0p1): Mounted with checkpoint version = 76732978
[   93.862327] kworker/u8:2: page allocation failure: order:6, 
mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[   93.862360] CPU: 0 PID: 187 Comm: kworker/u8:2 Tainted: G         C        
5.8.5-sunxi #trunk
[   93.862364] Hardware name: Allwinner sun8i Family
[   93.862388] Workqueue: writeback wb_workfn (flush-179:0)
[   93.862424] [<c010d6d5>] (unwind_backtrace) from [<c0109a55>] 
(show_stack+0x11/0x14)
[   93.862439] [<c0109a55>] (show_stack) from [<c056eae9>] 
(dump_stack+0x75/0x84)
[   93.862456] [<c056eae9>] (dump_stack) from [<c0243b8f>] 
(warn_alloc+0xa3/0x104)
[   93.862469] [<c0243b8f>] (warn_alloc) from [<c0244777>] 
(__alloc_pages_nodemask+0xb87/0xc40)
[   93.862483] [<c0244777>] (__alloc_pages_nodemask) from [<c02267fd>] 
(kmalloc_order+0x19/0x38)
[   93.862492] [<c02267fd>] (kmalloc_order) from [<c0226835>] 
(kmalloc_order_trace+0x19/0x90)
[   93.862506] [<c0226835>] (kmalloc_order_trace) from [<c047ddf5>] 
(zstd_init_compress_ctx+0x51/0xfc)
[   93.862518] [<c047ddf5>] (zstd_init_compress_ctx) from [<c047f90b>] 
(f2fs_write_multi_pages+0x27b/0x6a0)
[   93.862532] [<c047f90b>] (f2fs_write_multi_pages) from [<c046630d>] 
(f2fs_write_cache_pages+0x415/0x538)
[   93.862542] [<c046630d>] (f2fs_write_cache_pages) from [<c0466663>] 
(f2fs_write_data_pages+0x233/0x264)
[   93.862555] [<c0466663>] (f2fs_write_data_pages) from [<c0210ded>] 
(do_writepages+0x35/0x98)
[   93.862571] [<c0210ded>] (do_writepages) from [<c0290c4f>] 
(__writeback_single_inode+0x2f/0x358)
[   93.862584] [<c0290c4f>] (__writeback_single_inode) from [<c02910fd>] 
(writeback_sb_inodes+0x185/0x378)
[   93.862594] [<c02910fd>] (writeback_sb_inodes) from [<c0291321>] 
(__writeback_inodes_wb+0x31/0x88)
[   93.862603] [<c0291321>] (__writeback_inodes_wb) from [<c029156b>] 
(wb_writeback+0x1f3/0x264)
[   93.862612] [<c029156b>] (wb_writeback) from [<c0292461>] 
(wb_workfn+0x24d/0x3a4)
[   93.862624] [<c0292461>] (wb_workfn) from [<c0130b2f>] 
(process_one_work+0x15f/0x3b0)
[   93.862634] [<c0130b2f>] (process_one_work) from [<c0130e7b>] 
(worker_thread+0xfb/0x3e0)
[   93.862646] [<c0130e7b>] (worker_thread) from [<c0135c3b>] 
(kthread+0xeb/0x10c)
[   93.862656] [<c0135c3b>] (kthread) from [<c0100159>] 
(ret_from_fork+0x11/0x38)
[   93.862661] Exception stack(0xd4167fb0 to 0xd4167ff8)
[   93.862667] 7fa0:                                     00000000 00000000 
00000000 00000000
[   93.862674] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000
[   93.862680] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   93.862686] Mem-Info:
[   93.862699] active_anon:3457 inactive_anon:6470 isolated_anon:32
                  active_file:14148 inactive_file:75224 isolated_file:0
                  unevictable:4 dirty:10374 writeback:151
                  slab_reclaimable:4946 slab_unreclaimable:8951
                  mapped:5557 shmem:414 pagetables:332 bounce:0
                  free:5946 free_pcp:118 free_cma:4292
[   93.862709] Node 0 active_anon:13828kB inactive_anon:26032kB 
active_file:56592kB inactive_file:300896kB unevictable:16kB isolated(anon):0kB 
isolated(file):0kB mapped:22228kB dirty:41496kB writeback:604kB shmem:1656kB 
writeback_tmp:0kB all_unreclaimable? no
[   93.862725] Normal free:23784kB min:6904kB low:7604kB high:8304kB 
reserved_highatomic:0KB active_anon:13956kB inactive_anon:25800kB 
active_file:56592kB inactive_file:301212kB unevictable:16kB 
writepending:42024kB present:524288kB managed:503888kB mlocked:16kB 
kernel_stack:1200kB pagetables:1328kB bounce:0kB free_pcp:472kB local_pcp:196kB 
free_cma:17168kB
[   93.862727] lowmem_reserve[]: 0 0 0
[   93.862734] Normal: 95*4kB (UMEC) 122*8kB (UMEC) 45*16kB (UMEC) 32*32kB 
(UMEC) 17*64kB (UMEC) 7*128kB (UMEC) 4*256kB (U) 3*512kB (UC) 0*1024kB 0*2048kB 
4*4096kB (C) = 24028kB
[   93.862762] 89790 total pagecache pages
[   93.862768] 0 pages in swap cache
[   93.862771] Swap cache stats: add 0, delete 0, find 0/0
[   93.862773] Free swap  = 251940kB
[   93.862775] Total swap = 251940kB
[   93.862777] 131072 pages RAM
[   93.862780] 0 pages HighMem/MovableOnly
[   93.862782] 5100 pages reserved
[   93.862784] 32768 pages cma reserved

I haven't tried lowering MAX_COMPRESS_LOG_SIZE in this kernel yet but will test 
this when I can.

On Tue, Aug 25, 2020, at 1:31 PM, 5kft wrote:
Note that I don't think that this particular problem is a memleak as it happens 
very quickly when simply copying files to the zstd-mounted filesystem - but I 
haven't been able to compare the 5.8.3 changes to 5.9-rc1 yet.  This particular 
board boots up with vm.min_free_kbytes = 2406, which seems pretty low, but the 
board only has 512MB RAM on it total.  Kind of crazy I know, but it's a good 
test case for this problem :-)  Also, again lz4 compression works fine at this 
low value.

I'm not sure that this particular change (lowering MAX_COMPRESS_LOG_SIZE) helps 
significantly.  I'm still seeing the failures even with vm.mem_free_kbytes = 
32768 (and this seems like a rather high value compared to the default).

On Tue, Aug 25, 2020, at 12:43 PM, Jaegeuk Kim wrote:
So, if there's no memleak in f2fs but we need to do something like that, I feel 
that something is misconfigured in f2fs wrt zstd.
I took a look at zstd initialization flow, it seems f2fs is asking too much 
memory space for the workspace when comparing it with btrfs.
Could you please check whether replacing the below "8" with "5" mitigates the problem? 
("5" is used in btrfs.)

In fs/f2fs/f2fs.h,
#define MAX_COMPRESS_LOG_SIZE           8



2020년 8월 25일 (화) 오후 12:30, 5kft <[email protected]>님이 작성:
__
Will do!  Quick question - should these changes handle a low 
"vm.min_free_kbytes" situation with f2fs?  I can workaround for now by 
increasing this value per-board, although I don't know how high to increase it to (and 
I'm not sure typical users of f2fs with compression would know how to determine the right 
value either).

On Tue, Aug 25, 2020, at 12:25 PM, Jaegeuk Kim wrote:
Oh, can you try to get the diff from up-to-date f2fs?

# cd <5.8.3_branch>
# git diff <5.9-rc1_branch> fs/f2fs

2020년 8월 25일 (화) 오전 11:45, 5kft <[email protected]>님이 작성:
__
Indeed these changes are present in 5.8.3 (copy from the compress.c on my 
build):

                  err = f2fs_write_compressed_pages(cc, submitted,
                                                          wbc, io_type);
                  cops->destroy_compress_ctx(cc);
                  kfree(cc->cpages);
                  cc->cpages = NULL;
                  if (!err)
                          return 0;

On Tue, Aug 25, 2020, at 11:37 AM, Jaegeuk Kim wrote:
Hi,

Thank you for the test and report. :)

Just to make sure if there's any missing fixes, I guess the gap is the recent 
5.9-rc1 updates.
Looking at a glance, potential memory leak was fixed by the below commit among 
them. Could you give it a try?
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-stable.git/commit/?h=linux-5.4.y&id=721ef9e46dec3091fa7cd955da99ce83a850ab32

Thanks,


2020년 8월 25일 (화) 오전 11:09, 5kft <[email protected]>님이 작성:
__
I did a little quick testing further on this problem, and I found that if I increase 
"vm.min_free_kbytes" then the allocations (not surprisingly) work and the 
failures go away.  E.g., this appears to make it work fine:

      sysctl -w vm.min_free_kbytes=65536

I didn't bisect this to find out what the lowest/safe minimum should be...

Is there a way that F2FS should indicate that a change like this may be 
necessary when using zstd compression on some platforms?  Perhaps this is just 
a documentation addition?  I just want to save others from the pain of a 
potentially corrupted filesystem when using zstd compression because F2FS was 
internally running out of memory (which is what happened to me...)

Thanks!

On Tue, Aug 25, 2020, at 7:47 AM, 5kft wrote:
Hi Jaegeuk,

First, I'd like to apologize in advance if a direct email isn't appropriate for 
reporting bugs in f2fs; I'm not sure what the accepted process is for reporting 
issues in F2FS.

I am a contributor to the Armbian project (https://www.armbian.com/ and 
https://github.com/armbian), and have been using compression in F2FS for some time now - 
very nice work - LZ4 compression works great!  Unfortunately, however, when I try using 
"zstd" compression, I consistently get numerous kernel page allocation failures 
(and not surprisingly in some cases corruption of data from the filesystem).  I've been 
seeing this for some time but finally got a few minutes to write this email to you.

What follows is an example of the problem on a small SBC (Nano Pi NEO Air - 
https://www.friendlyarm.com/index.php?route=product/product&product_id=151), 
although I have reproduced this issue on some 64-bit ARM A53 boards as well (e.g., 
w/1GB RAM, including the Nano Pi NEO2, NEO2 Black, etc.)  I have not tried zstd on 
an amd64 machine yet.

This filesystem is formatted with compression ("-O extra_attr,enable_compression"), and mounted to 
use zstd compression ("-o compress_algorithm=zstd"), and the root mount directory has compression 
enabled ("chattr +c mntpt").  After doing a simple test copy of a number of files to it, it started 
giving page allocation failures - example traps are provided below.

I'm not sure if there are some kernel memory parameters that need to be changed 
or something, but even so it seems to me that this sort of thing shouldn't 
happen by default by a filesystem :-)  Here are a couple of example failure 
cases, running on stable kernel 5.8.3:

[168053.070957] F2FS-fs (mmcblk0p1): Found nat_bits in checkpoint
[168053.742204] F2FS-fs (mmcblk0p1): Mounted with checkpoint version = 37a48fb3
[168170.268522] kworker/u8:1: page allocation failure: order:6, 
mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[168170.268556] CPU: 3 PID: 7830 Comm: kworker/u8:1 Tainted: G         C        
5.8.3-sunxi #trunk
[168170.268559] Hardware name: Allwinner sun8i Family
[168170.268580] Workqueue: writeback wb_workfn (flush-179:24)
[168170.268611] [<c010d6d5>] (unwind_backtrace) from [<c0109a55>] 
(show_stack+0x11/0x14)
[168170.268624] [<c0109a55>] (show_stack) from [<c056d489>] 
(dump_stack+0x75/0x84)
[168170.268639] [<c056d489>] (dump_stack) from [<c0243b53>] 
(warn_alloc+0xa3/0x104)
[168170.268651] [<c0243b53>] (warn_alloc) from [<c024473b>] 
(__alloc_pages_nodemask+0xb87/0xc40)
[168170.268662] [<c024473b>] (__alloc_pages_nodemask) from [<c02267c5>] 
(kmalloc_order+0x19/0x38)
[168170.268672] [<c02267c5>] (kmalloc_order) from [<c02267fd>] 
(kmalloc_order_trace+0x19/0x90)
[168170.268685] [<c02267fd>] (kmalloc_order_trace) from [<c047c805>] 
(zstd_init_compress_ctx+0x51/0xfc)
[168170.268697] [<c047c805>] (zstd_init_compress_ctx) from [<c047e2bd>] 
(f2fs_write_multi_pages+0x269/0x68c)
[168170.268708] [<c047e2bd>] (f2fs_write_multi_pages) from [<c0465163>] 
(f2fs_write_cache_pages+0x3bf/0x538)
[168170.268718] [<c0465163>] (f2fs_write_cache_pages) from [<c046550f>] 
(f2fs_write_data_pages+0x233/0x264)
[168170.268730] [<c046550f>] (f2fs_write_data_pages) from [<c0210db5>] 
(do_writepages+0x35/0x98)
[168170.268745] [<c0210db5>] (do_writepages) from [<c0290c17>] 
(__writeback_single_inode+0x2f/0x358)
[168170.268757] [<c0290c17>] (__writeback_single_inode) from [<c02910c5>] 
(writeback_sb_inodes+0x185/0x378)
[168170.268766] [<c02910c5>] (writeback_sb_inodes) from [<c02912e9>] 
(__writeback_inodes_wb+0x31/0x88)
[168170.268776] [<c02912e9>] (__writeback_inodes_wb) from [<c0291533>] 
(wb_writeback+0x1f3/0x264)
[168170.268783] [<c0291533>] (wb_writeback) from [<c0292429>] 
(wb_workfn+0x24d/0x3a4)
[168170.268794] [<c0292429>] (wb_workfn) from [<c0130b2f>] 
(process_one_work+0x15f/0x3b0)
[168170.268803] [<c0130b2f>] (process_one_work) from [<c0130e7b>] 
(worker_thread+0xfb/0x3e0)
[168170.268813] [<c0130e7b>] (worker_thread) from [<c0135c3b>] 
(kthread+0xeb/0x10c)
[168170.268824] [<c0135c3b>] (kthread) from [<c0100159>] 
(ret_from_fork+0x11/0x38)
[168170.268829] Exception stack(0xccb67fb0 to 0xccb67ff8)
[168170.268835] 7fa0:                                     00000000 00000000 
00000000 00000000
[168170.268842] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000
[168170.268848] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[168170.268853] Mem-Info:
[168170.268867] active_anon:2089 inactive_anon:5866 isolated_anon:0
                   active_file:41402 inactive_file:37715 isolated_file:0
                   unevictable:4 dirty:9162 writeback:90
                   slab_reclaimable:5935 slab_unreclaimable:10851
                   mapped:4694 shmem:881 pagetables:369 bounce:0
                   free:12678 free_pcp:201 free_cma:11324
[168170.268877] Node 0 active_anon:8356kB inactive_anon:23464kB 
active_file:165608kB inactive_file:150860kB unevictable:16kB isolated(anon):0kB 
isolated(file):0kB mapped:18776kB dirty:36648kB writeback:360kB shmem:3524kB 
writeback_tmp:0kB all_unreclaimable? no
[168170.268891] Normal free:50712kB min:6500kB low:7100kB high:7700kB 
reserved_highatomic:0KB active_anon:8356kB inactive_anon:23464kB 
active_file:165764kB inactive_file:150884kB unevictable:16kB 
writepending:36944kB present:524288kB managed:503888kB mlocked:16kB 
kernel_stack:1144kB pagetables:1476kB bounce:0kB free_pcp:828kB local_pcp:116kB 
free_cma:45296kB
[168170.268893] lowmem_reserve[]: 0 0 0
[168170.268899] Normal: 1096*4kB (UMEC) 217*8kB (UMEC) 132*16kB (UMEC) 82*32kB 
(UMEC) 283*64kB (UC) 72*128kB (C) 16*256kB (UC) 9*512kB (UC) 4*1024kB (C) 
0*2048kB 0*4096kB = 50984kB
[168170.268927] 80105 total pagecache pages
[168170.268933] 72 pages in swap cache
[168170.268937] Swap cache stats: add 5255, delete 5182, find 5492/6131
[168170.268939] Free swap  = 232484kB
[168170.268941] Total swap = 251940kB
[168170.268944] 131072 pages RAM
[168170.268946] 0 pages HighMem/MovableOnly
[168170.268948] 5100 pages reserved
[168170.268951] 32768 pages cma reserved
[168182.775001] warn_alloc: 84 callbacks suppressed
[168182.775115] kworker/u9:3: page allocation failure: order:9, 
mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[168182.775235] CPU: 3 PID: 8168 Comm: kworker/u9:3 Tainted: G         C        
5.8.3-sunxi #trunk
[168182.775246] Hardware name: Allwinner sun8i Family
[168182.775367] Workqueue: f2fs_post_read_wq f2fs_post_read_work
[168182.775534] [<c010d6d5>] (unwind_backtrace) from [<c0109a55>] 
(show_stack+0x11/0x14)
[168182.775584] [<c0109a55>] (show_stack) from [<c056d489>] 
(dump_stack+0x75/0x84)
[168182.775658] [<c056d489>] (dump_stack) from [<c0243b53>] 
(warn_alloc+0xa3/0x104)
[168182.775689] [<c0243b53>] (warn_alloc) from [<c024473b>] 
(__alloc_pages_nodemask+0xb87/0xc40)
[168182.775731] [<c024473b>] (__alloc_pages_nodemask) from [<c02267c5>] 
(kmalloc_order+0x19/0x38)
[168182.775757] [<c02267c5>] (kmalloc_order) from [<c02267fd>] 
(kmalloc_order_trace+0x19/0x90)
[168182.775797] [<c02267fd>] (kmalloc_order_trace) from [<c047c665>] 
(zstd_init_decompress_ctx+0x21/0x88)
[168182.775837] [<c047c665>] (zstd_init_decompress_ctx) from [<c047e9cf>] 
(f2fs_decompress_pages+0x97/0x228)
[168182.775860] [<c047e9cf>] (f2fs_decompress_pages) from [<c045d0ab>] 
(__read_end_io+0xfb/0x130)
[168182.775871] [<c045d0ab>] (__read_end_io) from [<c045d141>] 
(f2fs_post_read_work+0x61/0x84)
[168182.775884] [<c045d141>] (f2fs_post_read_work) from [<c0130b2f>] 
(process_one_work+0x15f/0x3b0)
[168182.775893] [<c0130b2f>] (process_one_work) from [<c0130e7b>] 
(worker_thread+0xfb/0x3e0)
[168182.775905] [<c0130e7b>] (worker_thread) from [<c0135c3b>] 
(kthread+0xeb/0x10c)
[168182.775919] [<c0135c3b>] (kthread) from [<c0100159>] 
(ret_from_fork+0x11/0x38)
[168182.775924] Exception stack(0xcfd5ffb0 to 0xcfd5fff8)
[168182.775930] ffa0:                                     00000000 00000000 
00000000 00000000
[168182.775937] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000
[168182.775943] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
[168182.775949] Mem-Info:
[168182.775968] active_anon:2361 inactive_anon:4620 isolated_anon:0
                   active_file:16267 inactive_file:15209 isolated_file:0
                   unevictable:4 dirty:3287 writeback:0
                   slab_reclaimable:5976 slab_unreclaimable:11441
                   mapped:3760 shmem:485 pagetables:396 bounce:0
                   free:60170 free_pcp:71 free_cma:25015
[168182.775980] Node 0 active_anon:9444kB inactive_anon:18480kB 
active_file:65068kB inactive_file:60836kB unevictable:16kB isolated(anon):0kB 
isolated(file):0kB mapped:15040kB dirty:13148kB writeback:0kB shmem:1940kB 
writeback_tmp:0kB all_unreclaimable? no
[168182.775995] Normal free:240680kB min:2404kB low:3004kB high:3604kB 
reserved_highatomic:0KB active_anon:9444kB inactive_anon:18480kB 
active_file:65068kB inactive_file:60836kB unevictable:16kB writepending:13112kB 
present:524288kB managed:503888kB mlocked:16kB kernel_stack:1168kB 
pagetables:1584kB bounce:0kB free_pcp:280kB local_pcp:16kB free_cma:100060kB
[168182.775996] lowmem_reserve[]: 0 0 0
[168182.776003] Normal: 4668*4kB (UMEC) 4945*8kB (UMEC) 3001*16kB (UEC) 
1684*32kB (UMEC) 584*64kB (UMEC) 157*128kB (UMEC) 39*256kB (UMEC) 12*512kB 
(UMC) 7*1024kB (UMC) 0*2048kB 0*4096kB = 240904kB
[168182.776032] 32082 total pagecache pages
[168182.776039] 66 pages in swap cache
[168182.776043] Swap cache stats: add 6730, delete 6663, find 5492/6140
[168182.776045] Free swap  = 227108kB
[168182.776047] Total swap = 251940kB
[168182.776050] 131072 pages RAM
[168182.776052] 0 pages HighMem/MovableOnly
[168182.776054] 5100 pages reserved
[168182.776056] 32768 pages cma reserved

Again, I've had no issues on any of my boards when using lz4 compression, only 
with zstd.  (I have not had an opportunity to try lzo-rle yet.)  I'm happy to 
try to provide more information if necessary.  Thanks!










_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


.



_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to