I think one way to solve this issue is adding an extra mount option,
e.g. compress_mempool, once user specify this option, we can allocate
and reserve memory for one single workspace of compression algorithm,
during compress/decompress, once we can not grab any more memory from
system, we can let the process wait on that private mempool, it may
increase latency of write compressed data, however it decrease failure
ratio on low-end device.
On 2020/9/1 11:07, 5kft wrote:
Thanks for the patch - I applied it against 5.9-rc2, and it seems to help...:
The test I am using for this is to copy the entire rootfs tree to a
zstd-compressed f2fs partition. Previously, even a vm.min_free_kbytes of 32768
wasn't enough to avoid the allocation traps for the copy; with this patch I'm
able to complete the entire copy without an error at vm.min_free_kbytes=32768.
However, if I try vm.min_free_kbytes=16384 (for example), then it still runs
out of memory and logs many traps. It still seems rather excessive to require
so much available memory...?
Example traps at the system default vm.min_free_kbytes of ~2800 (following
board boot):
[ 141.863780] kworker/u8:4: page allocation failure: order:6,
mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[ 141.863810] CPU: 3 PID: 1444 Comm: kworker/u8:4 Tainted: G C
5.9.0-rc2-sunxi #trunk
[ 141.863812] Hardware name: Allwinner sun8i Family
[ 141.863833] Workqueue: writeback wb_workfn (flush-179:0)
[ 141.863859] [<c010d415>] (unwind_backtrace) from [<c01097a5>]
(show_stack+0x11/0x14)
[ 141.863872] [<c01097a5>] (show_stack) from [<c0573da1>]
(dump_stack+0x75/0x84)
[ 141.863888] [<c0573da1>] (dump_stack) from [<c0246163>]
(warn_alloc+0xa3/0x104)
[ 141.863899] [<c0246163>] (warn_alloc) from [<c0246d71>]
(__alloc_pages_nodemask+0xbad/0xc58)
[ 141.863911] [<c0246d71>] (__alloc_pages_nodemask) from [<c022a09f>]
(kmalloc_order+0x23/0x50)
[ 141.863920] [<c022a09f>] (kmalloc_order) from [<c022a0e5>]
(kmalloc_order_trace+0x19/0x90)
[ 141.863933] [<c022a0e5>] (kmalloc_order_trace) from [<c0481519>]
(zstd_init_compress_ctx+0x51/0xfc)
[ 141.863946] [<c0481519>] (zstd_init_compress_ctx) from [<c048304b>]
(f2fs_write_multi_pages+0x27b/0x6a0)
[ 141.863961] [<c048304b>] (f2fs_write_multi_pages) from [<c04699e3>]
(f2fs_write_cache_pages+0x3bf/0x538)
[ 141.863971] [<c04699e3>] (f2fs_write_cache_pages) from [<c0469d8f>]
(f2fs_write_data_pages+0x233/0x264)
[ 141.863985] [<c0469d8f>] (f2fs_write_data_pages) from [<c02139b9>]
(do_writepages+0x35/0x98)
[ 141.863995] [<c02139b9>] (do_writepages) from [<c02947ef>]
(__writeback_single_inode+0x2f/0x358)
[ 141.864004] [<c02947ef>] (__writeback_single_inode) from [<c0294c9d>]
(writeback_sb_inodes+0x185/0x378)
[ 141.864012] [<c0294c9d>] (writeback_sb_inodes) from [<c0294ec1>]
(__writeback_inodes_wb+0x31/0x88)
[ 141.864019] [<c0294ec1>] (__writeback_inodes_wb) from [<c029510b>]
(wb_writeback+0x1f3/0x264)
[ 141.864026] [<c029510b>] (wb_writeback) from [<c0296053>]
(wb_workfn+0x2a3/0x3a4)
[ 141.864035] [<c0296053>] (wb_workfn) from [<c0130313>]
(process_one_work+0x15f/0x3b0)
[ 141.864043] [<c0130313>] (process_one_work) from [<c013065f>]
(worker_thread+0xfb/0x3e0)
[ 141.864053] [<c013065f>] (worker_thread) from [<c0135407>]
(kthread+0xeb/0x10c)
[ 141.864063] [<c0135407>] (kthread) from [<c0100159>]
(ret_from_fork+0x11/0x38)
[ 141.864067] Exception stack(0xcf153fb0 to 0xcf153ff8)
[ 141.864073] 3fa0: 00000000 00000000
00000000 00000000
[ 141.864079] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
[ 141.864084] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 141.864089] Mem-Info:
[ 141.864103] active_anon:105 inactive_anon:9374 isolated_anon:0
active_file:12581 inactive_file:77234 isolated_file:32
unevictable:4 dirty:11187 writeback:174
slab_reclaimable:3566 slab_unreclaimable:6038
mapped:5698 shmem:414 pagetables:348 bounce:0
free:10114 free_pcp:223 free_cma:8329
[ 141.864114] Node 0 active_anon:420kB inactive_anon:37496kB
active_file:50324kB inactive_file:308936kB unevictable:16kB isolated(anon):0kB
isolated(file):128kB mapped:22792kB dirty:44748kB writeback:696kB shmem:1656kB
writeback_tmp:0kB kernel_stack:1216kB all_unreclaimable? no
[ 141.864127] Normal free:40456kB min:6904kB low:7604kB high:8304kB
reserved_highatomic:0KB active_anon:420kB inactive_anon:37496kB
active_file:50248kB inactive_file:308768kB unevictable:16kB
writepending:45608kB present:524288kB managed:503884kB mlocked:16kB
pagetables:1392kB bounce:0kB free_pcp:892kB local_pcp:176kB free_cma:33316kB
[ 141.864129] lowmem_reserve[]: 0 0 0
[ 141.864135] Normal: 88*4kB (UMEC) 107*8kB (UMEC) 51*16kB (UMEC) 29*32kB
(UMEC) 13*64kB (UMEC) 2*128kB (UE) 3*256kB (UC) 2*512kB (U) 2*1024kB (U)
0*2048kB 8*4096kB (C) = 40648kB
[ 141.864162] 90296 total pagecache pages
[ 141.864168] 0 pages in swap cache
[ 141.864171] Swap cache stats: add 0, delete 0, find 0/0
[ 141.864173] Free swap = 251940kB
[ 141.864175] Total swap = 251940kB
[ 141.864177] 131072 pages RAM
[ 141.864179] 0 pages HighMem/MovableOnly
[ 141.864181] 5101 pages reserved
[ 141.864184] 32768 pages cma reserved
[ 155.171118] warn_alloc: 23 callbacks suppressed
[ 155.171143] kworker/u8:4: page allocation failure: order:6,
mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[ 155.171168] CPU: 1 PID: 1444 Comm: kworker/u8:4 Tainted: G C
5.9.0-rc2-sunxi #trunk
[ 155.171172] Hardware name: Allwinner sun8i Family
[ 155.171195] Workqueue: writeback wb_workfn (flush-179:0)
[ 155.171229] [<c010d415>] (unwind_backtrace) from [<c01097a5>]
(show_stack+0x11/0x14)
[ 155.171243] [<c01097a5>] (show_stack) from [<c0573da1>]
(dump_stack+0x75/0x84)
[ 155.171266] [<c0573da1>] (dump_stack) from [<c0246163>]
(warn_alloc+0xa3/0x104)
[ 155.171281] [<c0246163>] (warn_alloc) from [<c0246d71>]
(__alloc_pages_nodemask+0xbad/0xc58)
[ 155.171294] [<c0246d71>] (__alloc_pages_nodemask) from [<c022a09f>]
(kmalloc_order+0x23/0x50)
[ 155.171304] [<c022a09f>] (kmalloc_order) from [<c022a0e5>]
(kmalloc_order_trace+0x19/0x90)
[ 155.171320] [<c022a0e5>] (kmalloc_order_trace) from [<c0481519>]
(zstd_init_compress_ctx+0x51/0xfc)
[ 155.171334] [<c0481519>] (zstd_init_compress_ctx) from [<c048304b>]
(f2fs_write_multi_pages+0x27b/0x6a0)
[ 155.171349] [<c048304b>] (f2fs_write_multi_pages) from [<c04699e3>]
(f2fs_write_cache_pages+0x3bf/0x538)
[ 155.171359] [<c04699e3>] (f2fs_write_cache_pages) from [<c0469d8f>]
(f2fs_write_data_pages+0x233/0x264)
[ 155.171374] [<c0469d8f>] (f2fs_write_data_pages) from [<c02139b9>]
(do_writepages+0x35/0x98)
[ 155.171385] [<c02139b9>] (do_writepages) from [<c02947ef>]
(__writeback_single_inode+0x2f/0x358)
[ 155.171394] [<c02947ef>] (__writeback_single_inode) from [<c0294c9d>]
(writeback_sb_inodes+0x185/0x378)
[ 155.171402] [<c0294c9d>] (writeback_sb_inodes) from [<c0294ec1>]
(__writeback_inodes_wb+0x31/0x88)
[ 155.171409] [<c0294ec1>] (__writeback_inodes_wb) from [<c029510b>]
(wb_writeback+0x1f3/0x264)
[ 155.171417] [<c029510b>] (wb_writeback) from [<c0295ffd>]
(wb_workfn+0x24d/0x3a4)
[ 155.171428] [<c0295ffd>] (wb_workfn) from [<c0130313>]
(process_one_work+0x15f/0x3b0)
[ 155.171437] [<c0130313>] (process_one_work) from [<c013065f>]
(worker_thread+0xfb/0x3e0)
[ 155.171447] [<c013065f>] (worker_thread) from [<c0135407>]
(kthread+0xeb/0x10c)
[ 155.171457] [<c0135407>] (kthread) from [<c0100159>]
(ret_from_fork+0x11/0x38)
[ 155.171462] Exception stack(0xcf153fb0 to 0xcf153ff8)
[ 155.171468] 3fa0: 00000000 00000000
00000000 00000000
[ 155.171474] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
[ 155.171480] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 155.171488] Mem-Info:
[ 155.171504] active_anon:105 inactive_anon:9403 isolated_anon:0
active_file:17189 inactive_file:52888 isolated_file:0
unevictable:4 dirty:11785 writeback:50
slab_reclaimable:4217 slab_unreclaimable:6052
mapped:5706 shmem:414 pagetables:349 bounce:0
free:29132 free_pcp:340 free_cma:27347
[ 155.171516] Node 0 active_anon:420kB inactive_anon:37612kB
active_file:68756kB inactive_file:211552kB unevictable:16kB isolated(anon):0kB
isolated(file):0kB mapped:22824kB dirty:47140kB writeback:200kB shmem:1656kB
writeback_tmp:0kB kernel_stack:1216kB all_unreclaimable? no
[ 155.171531] Normal free:116528kB min:6904kB low:7604kB high:8304kB
reserved_highatomic:0KB active_anon:420kB inactive_anon:37612kB
active_file:68680kB inactive_file:211696kB unevictable:16kB
writepending:47352kB present:524288kB managed:503884kB mlocked:16kB
pagetables:1396kB bounce:0kB free_pcp:1356kB local_pcp:8kB free_cma:109388kB
[ 155.171534] lowmem_reserve[]: 0 0 0
[ 155.171540] Normal: 365*4kB (UMEC) 188*8kB (UMEC) 153*16kB (UMC) 111*32kB
(UMC) 73*64kB (UMC) 44*128kB (UC) 33*256kB (UC) 18*512kB (UC) 18*1024kB (UC)
6*2048kB (C) 12*4096kB (C) = 116804kB
[ 155.171568] 70535 total pagecache pages
[ 155.171576] 0 pages in swap cache
[ 155.171579] Swap cache stats: add 0, delete 0, find 0/0
[ 155.171581] Free swap = 251940kB
[ 155.171583] Total swap = 251940kB
[ 155.171585] 131072 pages RAM
[ 155.171587] 0 pages HighMem/MovableOnly
[ 155.171590] 5101 pages reserved
[ 155.171592] 32768 pages cma reserved
On Mon, Aug 31, 2020, at 6:39 PM, Chao Yu wrote:
Hi,
We should align max compress window size of zstd to cluster size of
current inode,
by default, cluster size is 16KB (log size is 2), so it can reduce size
of allocated
memory significantly.
So, could you please try below patch first?
From c4bf178e5133525027d817a2ac542db6f5621c4f Mon Sep 17 00:00:00 2001
From: Chao Yu <[email protected]>
Date: Tue, 1 Sep 2020 09:29:08 +0800
Subject: [PATCH] fix memory allocation failure on zstd decompression
Signed-off-by: Chao Yu <[email protected]>
---
fs/f2fs/compress.c | 7 ++++---
fs/f2fs/f2fs.h | 2 +-
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index df097c4a71e1..357303d8514b 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -382,16 +382,17 @@ static int zstd_init_decompress_ctx(struct
decompress_io_ctx *dic)
ZSTD_DStream *stream;
void *workspace;
unsigned int workspace_size;
+ unsigned int max_window_size =
+ MAX_COMPRESS_WINDOW_SIZE(dic->log_cluster_size);
- workspace_size = ZSTD_DStreamWorkspaceBound(MAX_COMPRESS_WINDOW_SIZE);
+ workspace_size = ZSTD_DStreamWorkspaceBound(max_window_size);
workspace = f2fs_kvmalloc(F2FS_I_SB(dic->inode),
workspace_size, GFP_NOFS);
if (!workspace)
return -ENOMEM;
- stream = ZSTD_initDStream(MAX_COMPRESS_WINDOW_SIZE,
- workspace, workspace_size);
+ stream = ZSTD_initDStream(max_window_size, workspace, workspace_size);
if (!stream) {
printk_ratelimited("%sF2FS-fs (%s): %s ZSTD_initDStream
failed\n",
KERN_ERR, F2FS_I_SB(dic->inode)->sb->s_id,
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 21f86001bb3a..d210809292f9 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1419,7 +1419,7 @@ struct decompress_io_ctx {
#define NULL_CLUSTER ((unsigned int)(~0))
#define MIN_COMPRESS_LOG_SIZE 2
#define MAX_COMPRESS_LOG_SIZE 8
-#define MAX_COMPRESS_WINDOW_SIZE ((PAGE_SIZE) << MAX_COMPRESS_LOG_SIZE)
+#define MAX_COMPRESS_WINDOW_SIZE(log_size) ((PAGE_SIZE) << (log_size))
struct f2fs_sb_info {
struct super_block *sb; /* pointer to VFS super block */
--
2.26.2
On 2020/9/1 2:14, 5kft wrote:
Sounds good :-) Perhaps it's simply that zstd needs a lot of memory to operate, however
it's unfortunate that it doesn't work on smaller platforms "out of the box"
like lz4 does. Should there a be note or guidance of some sort regarding this for
smaller embedded platforms?
On Mon, Aug 31, 2020, at 11:04 AM, Jaegeuk Kim wrote:
Let me add more f2fs folks. :)
On 08/27, 5kft wrote:
(Note that for testing this I backported f2fs from 5.9-rc2 into 5.8.5, as I
don't have 5.9 working on these boards yet.)
On Thu, Aug 27, 2020, at 7:39 AM, 5kft wrote:
Quick update - I encounter the problem with f2fs zstd compression in the
mainline 5.9-rc2 kernel as well - e.g.,
[ 67.668529] F2FS-fs (mmcblk0p1): Found nat_bits in checkpoint
[ 68.339021] F2FS-fs (mmcblk0p1): Mounted with checkpoint version = 76732978
[ 93.862327] kworker/u8:2: page allocation failure: order:6,
mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[ 93.862360] CPU: 0 PID: 187 Comm: kworker/u8:2 Tainted: G C
5.8.5-sunxi #trunk
[ 93.862364] Hardware name: Allwinner sun8i Family
[ 93.862388] Workqueue: writeback wb_workfn (flush-179:0)
[ 93.862424] [<c010d6d5>] (unwind_backtrace) from [<c0109a55>]
(show_stack+0x11/0x14)
[ 93.862439] [<c0109a55>] (show_stack) from [<c056eae9>]
(dump_stack+0x75/0x84)
[ 93.862456] [<c056eae9>] (dump_stack) from [<c0243b8f>]
(warn_alloc+0xa3/0x104)
[ 93.862469] [<c0243b8f>] (warn_alloc) from [<c0244777>]
(__alloc_pages_nodemask+0xb87/0xc40)
[ 93.862483] [<c0244777>] (__alloc_pages_nodemask) from [<c02267fd>]
(kmalloc_order+0x19/0x38)
[ 93.862492] [<c02267fd>] (kmalloc_order) from [<c0226835>]
(kmalloc_order_trace+0x19/0x90)
[ 93.862506] [<c0226835>] (kmalloc_order_trace) from [<c047ddf5>]
(zstd_init_compress_ctx+0x51/0xfc)
[ 93.862518] [<c047ddf5>] (zstd_init_compress_ctx) from [<c047f90b>]
(f2fs_write_multi_pages+0x27b/0x6a0)
[ 93.862532] [<c047f90b>] (f2fs_write_multi_pages) from [<c046630d>]
(f2fs_write_cache_pages+0x415/0x538)
[ 93.862542] [<c046630d>] (f2fs_write_cache_pages) from [<c0466663>]
(f2fs_write_data_pages+0x233/0x264)
[ 93.862555] [<c0466663>] (f2fs_write_data_pages) from [<c0210ded>]
(do_writepages+0x35/0x98)
[ 93.862571] [<c0210ded>] (do_writepages) from [<c0290c4f>]
(__writeback_single_inode+0x2f/0x358)
[ 93.862584] [<c0290c4f>] (__writeback_single_inode) from [<c02910fd>]
(writeback_sb_inodes+0x185/0x378)
[ 93.862594] [<c02910fd>] (writeback_sb_inodes) from [<c0291321>]
(__writeback_inodes_wb+0x31/0x88)
[ 93.862603] [<c0291321>] (__writeback_inodes_wb) from [<c029156b>]
(wb_writeback+0x1f3/0x264)
[ 93.862612] [<c029156b>] (wb_writeback) from [<c0292461>]
(wb_workfn+0x24d/0x3a4)
[ 93.862624] [<c0292461>] (wb_workfn) from [<c0130b2f>]
(process_one_work+0x15f/0x3b0)
[ 93.862634] [<c0130b2f>] (process_one_work) from [<c0130e7b>]
(worker_thread+0xfb/0x3e0)
[ 93.862646] [<c0130e7b>] (worker_thread) from [<c0135c3b>]
(kthread+0xeb/0x10c)
[ 93.862656] [<c0135c3b>] (kthread) from [<c0100159>]
(ret_from_fork+0x11/0x38)
[ 93.862661] Exception stack(0xd4167fb0 to 0xd4167ff8)
[ 93.862667] 7fa0: 00000000 00000000
00000000 00000000
[ 93.862674] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
[ 93.862680] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 93.862686] Mem-Info:
[ 93.862699] active_anon:3457 inactive_anon:6470 isolated_anon:32
active_file:14148 inactive_file:75224 isolated_file:0
unevictable:4 dirty:10374 writeback:151
slab_reclaimable:4946 slab_unreclaimable:8951
mapped:5557 shmem:414 pagetables:332 bounce:0
free:5946 free_pcp:118 free_cma:4292
[ 93.862709] Node 0 active_anon:13828kB inactive_anon:26032kB
active_file:56592kB inactive_file:300896kB unevictable:16kB isolated(anon):0kB
isolated(file):0kB mapped:22228kB dirty:41496kB writeback:604kB shmem:1656kB
writeback_tmp:0kB all_unreclaimable? no
[ 93.862725] Normal free:23784kB min:6904kB low:7604kB high:8304kB
reserved_highatomic:0KB active_anon:13956kB inactive_anon:25800kB
active_file:56592kB inactive_file:301212kB unevictable:16kB
writepending:42024kB present:524288kB managed:503888kB mlocked:16kB
kernel_stack:1200kB pagetables:1328kB bounce:0kB free_pcp:472kB local_pcp:196kB
free_cma:17168kB
[ 93.862727] lowmem_reserve[]: 0 0 0
[ 93.862734] Normal: 95*4kB (UMEC) 122*8kB (UMEC) 45*16kB (UMEC) 32*32kB
(UMEC) 17*64kB (UMEC) 7*128kB (UMEC) 4*256kB (U) 3*512kB (UC) 0*1024kB 0*2048kB
4*4096kB (C) = 24028kB
[ 93.862762] 89790 total pagecache pages
[ 93.862768] 0 pages in swap cache
[ 93.862771] Swap cache stats: add 0, delete 0, find 0/0
[ 93.862773] Free swap = 251940kB
[ 93.862775] Total swap = 251940kB
[ 93.862777] 131072 pages RAM
[ 93.862780] 0 pages HighMem/MovableOnly
[ 93.862782] 5100 pages reserved
[ 93.862784] 32768 pages cma reserved
I haven't tried lowering MAX_COMPRESS_LOG_SIZE in this kernel yet but will test
this when I can.
On Tue, Aug 25, 2020, at 1:31 PM, 5kft wrote:
Note that I don't think that this particular problem is a memleak as it happens
very quickly when simply copying files to the zstd-mounted filesystem - but I
haven't been able to compare the 5.8.3 changes to 5.9-rc1 yet. This particular
board boots up with vm.min_free_kbytes = 2406, which seems pretty low, but the
board only has 512MB RAM on it total. Kind of crazy I know, but it's a good
test case for this problem :-) Also, again lz4 compression works fine at this
low value.
I'm not sure that this particular change (lowering MAX_COMPRESS_LOG_SIZE) helps
significantly. I'm still seeing the failures even with vm.mem_free_kbytes =
32768 (and this seems like a rather high value compared to the default).
On Tue, Aug 25, 2020, at 12:43 PM, Jaegeuk Kim wrote:
So, if there's no memleak in f2fs but we need to do something like that, I feel
that something is misconfigured in f2fs wrt zstd.
I took a look at zstd initialization flow, it seems f2fs is asking too much
memory space for the workspace when comparing it with btrfs.
Could you please check whether replacing the below "8" with "5" mitigates the problem?
("5" is used in btrfs.)
In fs/f2fs/f2fs.h,
#define MAX_COMPRESS_LOG_SIZE 8
2020년 8월 25일 (화) 오후 12:30, 5kft <[email protected]>님이 작성:
__
Will do! Quick question - should these changes handle a low
"vm.min_free_kbytes" situation with f2fs? I can workaround for now by
increasing this value per-board, although I don't know how high to increase it to (and
I'm not sure typical users of f2fs with compression would know how to determine the right
value either).
On Tue, Aug 25, 2020, at 12:25 PM, Jaegeuk Kim wrote:
Oh, can you try to get the diff from up-to-date f2fs?
# cd <5.8.3_branch>
# git diff <5.9-rc1_branch> fs/f2fs
2020년 8월 25일 (화) 오전 11:45, 5kft <[email protected]>님이 작성:
__
Indeed these changes are present in 5.8.3 (copy from the compress.c on my
build):
err = f2fs_write_compressed_pages(cc, submitted,
wbc, io_type);
cops->destroy_compress_ctx(cc);
kfree(cc->cpages);
cc->cpages = NULL;
if (!err)
return 0;
On Tue, Aug 25, 2020, at 11:37 AM, Jaegeuk Kim wrote:
Hi,
Thank you for the test and report. :)
Just to make sure if there's any missing fixes, I guess the gap is the recent
5.9-rc1 updates.
Looking at a glance, potential memory leak was fixed by the below commit among
them. Could you give it a try?
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-stable.git/commit/?h=linux-5.4.y&id=721ef9e46dec3091fa7cd955da99ce83a850ab32
Thanks,
2020년 8월 25일 (화) 오전 11:09, 5kft <[email protected]>님이 작성:
__
I did a little quick testing further on this problem, and I found that if I increase
"vm.min_free_kbytes" then the allocations (not surprisingly) work and the
failures go away. E.g., this appears to make it work fine:
sysctl -w vm.min_free_kbytes=65536
I didn't bisect this to find out what the lowest/safe minimum should be...
Is there a way that F2FS should indicate that a change like this may be
necessary when using zstd compression on some platforms? Perhaps this is just
a documentation addition? I just want to save others from the pain of a
potentially corrupted filesystem when using zstd compression because F2FS was
internally running out of memory (which is what happened to me...)
Thanks!
On Tue, Aug 25, 2020, at 7:47 AM, 5kft wrote:
Hi Jaegeuk,
First, I'd like to apologize in advance if a direct email isn't appropriate for
reporting bugs in f2fs; I'm not sure what the accepted process is for reporting
issues in F2FS.
I am a contributor to the Armbian project (https://www.armbian.com/ and
https://github.com/armbian), and have been using compression in F2FS for some time now -
very nice work - LZ4 compression works great! Unfortunately, however, when I try using
"zstd" compression, I consistently get numerous kernel page allocation failures
(and not surprisingly in some cases corruption of data from the filesystem). I've been
seeing this for some time but finally got a few minutes to write this email to you.
What follows is an example of the problem on a small SBC (Nano Pi NEO Air -
https://www.friendlyarm.com/index.php?route=product/product&product_id=151),
although I have reproduced this issue on some 64-bit ARM A53 boards as well (e.g.,
w/1GB RAM, including the Nano Pi NEO2, NEO2 Black, etc.) I have not tried zstd on
an amd64 machine yet.
This filesystem is formatted with compression ("-O extra_attr,enable_compression"), and mounted to
use zstd compression ("-o compress_algorithm=zstd"), and the root mount directory has compression
enabled ("chattr +c mntpt"). After doing a simple test copy of a number of files to it, it started
giving page allocation failures - example traps are provided below.
I'm not sure if there are some kernel memory parameters that need to be changed
or something, but even so it seems to me that this sort of thing shouldn't
happen by default by a filesystem :-) Here are a couple of example failure
cases, running on stable kernel 5.8.3:
[168053.070957] F2FS-fs (mmcblk0p1): Found nat_bits in checkpoint
[168053.742204] F2FS-fs (mmcblk0p1): Mounted with checkpoint version = 37a48fb3
[168170.268522] kworker/u8:1: page allocation failure: order:6,
mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[168170.268556] CPU: 3 PID: 7830 Comm: kworker/u8:1 Tainted: G C
5.8.3-sunxi #trunk
[168170.268559] Hardware name: Allwinner sun8i Family
[168170.268580] Workqueue: writeback wb_workfn (flush-179:24)
[168170.268611] [<c010d6d5>] (unwind_backtrace) from [<c0109a55>]
(show_stack+0x11/0x14)
[168170.268624] [<c0109a55>] (show_stack) from [<c056d489>]
(dump_stack+0x75/0x84)
[168170.268639] [<c056d489>] (dump_stack) from [<c0243b53>]
(warn_alloc+0xa3/0x104)
[168170.268651] [<c0243b53>] (warn_alloc) from [<c024473b>]
(__alloc_pages_nodemask+0xb87/0xc40)
[168170.268662] [<c024473b>] (__alloc_pages_nodemask) from [<c02267c5>]
(kmalloc_order+0x19/0x38)
[168170.268672] [<c02267c5>] (kmalloc_order) from [<c02267fd>]
(kmalloc_order_trace+0x19/0x90)
[168170.268685] [<c02267fd>] (kmalloc_order_trace) from [<c047c805>]
(zstd_init_compress_ctx+0x51/0xfc)
[168170.268697] [<c047c805>] (zstd_init_compress_ctx) from [<c047e2bd>]
(f2fs_write_multi_pages+0x269/0x68c)
[168170.268708] [<c047e2bd>] (f2fs_write_multi_pages) from [<c0465163>]
(f2fs_write_cache_pages+0x3bf/0x538)
[168170.268718] [<c0465163>] (f2fs_write_cache_pages) from [<c046550f>]
(f2fs_write_data_pages+0x233/0x264)
[168170.268730] [<c046550f>] (f2fs_write_data_pages) from [<c0210db5>]
(do_writepages+0x35/0x98)
[168170.268745] [<c0210db5>] (do_writepages) from [<c0290c17>]
(__writeback_single_inode+0x2f/0x358)
[168170.268757] [<c0290c17>] (__writeback_single_inode) from [<c02910c5>]
(writeback_sb_inodes+0x185/0x378)
[168170.268766] [<c02910c5>] (writeback_sb_inodes) from [<c02912e9>]
(__writeback_inodes_wb+0x31/0x88)
[168170.268776] [<c02912e9>] (__writeback_inodes_wb) from [<c0291533>]
(wb_writeback+0x1f3/0x264)
[168170.268783] [<c0291533>] (wb_writeback) from [<c0292429>]
(wb_workfn+0x24d/0x3a4)
[168170.268794] [<c0292429>] (wb_workfn) from [<c0130b2f>]
(process_one_work+0x15f/0x3b0)
[168170.268803] [<c0130b2f>] (process_one_work) from [<c0130e7b>]
(worker_thread+0xfb/0x3e0)
[168170.268813] [<c0130e7b>] (worker_thread) from [<c0135c3b>]
(kthread+0xeb/0x10c)
[168170.268824] [<c0135c3b>] (kthread) from [<c0100159>]
(ret_from_fork+0x11/0x38)
[168170.268829] Exception stack(0xccb67fb0 to 0xccb67ff8)
[168170.268835] 7fa0: 00000000 00000000
00000000 00000000
[168170.268842] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
[168170.268848] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[168170.268853] Mem-Info:
[168170.268867] active_anon:2089 inactive_anon:5866 isolated_anon:0
active_file:41402 inactive_file:37715 isolated_file:0
unevictable:4 dirty:9162 writeback:90
slab_reclaimable:5935 slab_unreclaimable:10851
mapped:4694 shmem:881 pagetables:369 bounce:0
free:12678 free_pcp:201 free_cma:11324
[168170.268877] Node 0 active_anon:8356kB inactive_anon:23464kB
active_file:165608kB inactive_file:150860kB unevictable:16kB isolated(anon):0kB
isolated(file):0kB mapped:18776kB dirty:36648kB writeback:360kB shmem:3524kB
writeback_tmp:0kB all_unreclaimable? no
[168170.268891] Normal free:50712kB min:6500kB low:7100kB high:7700kB
reserved_highatomic:0KB active_anon:8356kB inactive_anon:23464kB
active_file:165764kB inactive_file:150884kB unevictable:16kB
writepending:36944kB present:524288kB managed:503888kB mlocked:16kB
kernel_stack:1144kB pagetables:1476kB bounce:0kB free_pcp:828kB local_pcp:116kB
free_cma:45296kB
[168170.268893] lowmem_reserve[]: 0 0 0
[168170.268899] Normal: 1096*4kB (UMEC) 217*8kB (UMEC) 132*16kB (UMEC) 82*32kB
(UMEC) 283*64kB (UC) 72*128kB (C) 16*256kB (UC) 9*512kB (UC) 4*1024kB (C)
0*2048kB 0*4096kB = 50984kB
[168170.268927] 80105 total pagecache pages
[168170.268933] 72 pages in swap cache
[168170.268937] Swap cache stats: add 5255, delete 5182, find 5492/6131
[168170.268939] Free swap = 232484kB
[168170.268941] Total swap = 251940kB
[168170.268944] 131072 pages RAM
[168170.268946] 0 pages HighMem/MovableOnly
[168170.268948] 5100 pages reserved
[168170.268951] 32768 pages cma reserved
[168182.775001] warn_alloc: 84 callbacks suppressed
[168182.775115] kworker/u9:3: page allocation failure: order:9,
mode:0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
[168182.775235] CPU: 3 PID: 8168 Comm: kworker/u9:3 Tainted: G C
5.8.3-sunxi #trunk
[168182.775246] Hardware name: Allwinner sun8i Family
[168182.775367] Workqueue: f2fs_post_read_wq f2fs_post_read_work
[168182.775534] [<c010d6d5>] (unwind_backtrace) from [<c0109a55>]
(show_stack+0x11/0x14)
[168182.775584] [<c0109a55>] (show_stack) from [<c056d489>]
(dump_stack+0x75/0x84)
[168182.775658] [<c056d489>] (dump_stack) from [<c0243b53>]
(warn_alloc+0xa3/0x104)
[168182.775689] [<c0243b53>] (warn_alloc) from [<c024473b>]
(__alloc_pages_nodemask+0xb87/0xc40)
[168182.775731] [<c024473b>] (__alloc_pages_nodemask) from [<c02267c5>]
(kmalloc_order+0x19/0x38)
[168182.775757] [<c02267c5>] (kmalloc_order) from [<c02267fd>]
(kmalloc_order_trace+0x19/0x90)
[168182.775797] [<c02267fd>] (kmalloc_order_trace) from [<c047c665>]
(zstd_init_decompress_ctx+0x21/0x88)
[168182.775837] [<c047c665>] (zstd_init_decompress_ctx) from [<c047e9cf>]
(f2fs_decompress_pages+0x97/0x228)
[168182.775860] [<c047e9cf>] (f2fs_decompress_pages) from [<c045d0ab>]
(__read_end_io+0xfb/0x130)
[168182.775871] [<c045d0ab>] (__read_end_io) from [<c045d141>]
(f2fs_post_read_work+0x61/0x84)
[168182.775884] [<c045d141>] (f2fs_post_read_work) from [<c0130b2f>]
(process_one_work+0x15f/0x3b0)
[168182.775893] [<c0130b2f>] (process_one_work) from [<c0130e7b>]
(worker_thread+0xfb/0x3e0)
[168182.775905] [<c0130e7b>] (worker_thread) from [<c0135c3b>]
(kthread+0xeb/0x10c)
[168182.775919] [<c0135c3b>] (kthread) from [<c0100159>]
(ret_from_fork+0x11/0x38)
[168182.775924] Exception stack(0xcfd5ffb0 to 0xcfd5fff8)
[168182.775930] ffa0: 00000000 00000000
00000000 00000000
[168182.775937] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
[168182.775943] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
[168182.775949] Mem-Info:
[168182.775968] active_anon:2361 inactive_anon:4620 isolated_anon:0
active_file:16267 inactive_file:15209 isolated_file:0
unevictable:4 dirty:3287 writeback:0
slab_reclaimable:5976 slab_unreclaimable:11441
mapped:3760 shmem:485 pagetables:396 bounce:0
free:60170 free_pcp:71 free_cma:25015
[168182.775980] Node 0 active_anon:9444kB inactive_anon:18480kB
active_file:65068kB inactive_file:60836kB unevictable:16kB isolated(anon):0kB
isolated(file):0kB mapped:15040kB dirty:13148kB writeback:0kB shmem:1940kB
writeback_tmp:0kB all_unreclaimable? no
[168182.775995] Normal free:240680kB min:2404kB low:3004kB high:3604kB
reserved_highatomic:0KB active_anon:9444kB inactive_anon:18480kB
active_file:65068kB inactive_file:60836kB unevictable:16kB writepending:13112kB
present:524288kB managed:503888kB mlocked:16kB kernel_stack:1168kB
pagetables:1584kB bounce:0kB free_pcp:280kB local_pcp:16kB free_cma:100060kB
[168182.775996] lowmem_reserve[]: 0 0 0
[168182.776003] Normal: 4668*4kB (UMEC) 4945*8kB (UMEC) 3001*16kB (UEC)
1684*32kB (UMEC) 584*64kB (UMEC) 157*128kB (UMEC) 39*256kB (UMEC) 12*512kB
(UMC) 7*1024kB (UMC) 0*2048kB 0*4096kB = 240904kB
[168182.776032] 32082 total pagecache pages
[168182.776039] 66 pages in swap cache
[168182.776043] Swap cache stats: add 6730, delete 6663, find 5492/6140
[168182.776045] Free swap = 227108kB
[168182.776047] Total swap = 251940kB
[168182.776050] 131072 pages RAM
[168182.776052] 0 pages HighMem/MovableOnly
[168182.776054] 5100 pages reserved
[168182.776056] 32768 pages cma reserved
Again, I've had no issues on any of my boards when using lz4 compression, only
with zstd. (I have not had an opportunity to try lzo-rle yet.) I'm happy to
try to provide more information if necessary. Thanks!
_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
.
_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel