Hi Chunhai,
On 2023/12/28 21:00, Chunhai Guo wrote:
Using a global page pool for LZ4 decompression significantly reduces the
time spent on page allocation in low memory scenarios.
The table below shows the reduction in time spent on page allocation for
LZ4 decompression when using a global page pool.
The results were obtained from multi-app launch benchmarks on ARM64
Android devices running the 5.15 kernel.
+--------------+---------------+--------------+---------+
| | w/o page pool | w/ page pool | diff |
+--------------+---------------+--------------+---------+
| Average (ms) | 3434 | 21 | -99.38% |
+--------------+---------------+--------------+---------+
Based on the benchmark logs, it appears that 256 pages are sufficient
for most cases, but this can be adjusted as needed. Additionally,
turning on CONFIG_EROFS_FS_DEBUG will simplify the tuning process.
Thanks for the patch. I have some questions:
- what pcluster sizes are you using? 4k or more?
- what the detailed configuration are you using for the multi-app
launch workload? Such as CPU / Memory / the number of apps.
This patch currently only supports the LZ4 decompressor, other
decompressors will be supported in the next step.
Signed-off-by: Chunhai Guo <[email protected]>
---
fs/erofs/compress.h | 1 +
fs/erofs/decompressor.c | 42 ++++++++++++--
fs/erofs/internal.h | 5 ++
fs/erofs/super.c | 1 +
fs/erofs/utils.c | 121 ++++++++++++++++++++++++++++++++++++++++
5 files changed, 165 insertions(+), 5 deletions(-)
diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h
index 279933e007d2..67202b97d47b 100644
--- a/fs/erofs/compress.h
+++ b/fs/erofs/compress.h
@@ -31,6 +31,7 @@ struct z_erofs_decompressor {
/* some special page->private (unsigned long, see below) */
#define Z_EROFS_SHORTLIVED_PAGE (-1UL << 2)
#define Z_EROFS_PREALLOCATED_PAGE (-2UL << 2)
+#define Z_EROFS_POOL_PAGE (-3UL << 2)
/*
* For all pages in a pcluster, page->private should be one of
diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index d08a6ee23ac5..41b34f01416f 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -54,6 +54,7 @@ static int z_erofs_load_lz4_config(struct super_block *sb,
sbi->lz4.max_distance_pages = distance ?
DIV_ROUND_UP(distance, PAGE_SIZE) + 1 :
LZ4_MAX_DISTANCE_PAGES;
+ erofs_global_page_pool_init();
return erofs_pcpubuf_growsize(sbi->lz4.max_pclusterblks);
}
@@ -111,15 +112,42 @@ static int z_erofs_lz4_prepare_dstpages(struct z_erofs_lz4_decompress_ctx *ctx,
victim = availables[--top];
get_page(victim);
} else {
- victim = erofs_allocpage(pagepool,
+ victim = erofs_allocpage_for_decmpr(pagepool,
GFP_KERNEL | __GFP_NOFAIL);
For each read request, the extreme case here will be 15 pages
for 64k LZ4 sliding window (60k = 64k-4k). You could reduce
LZ4 sliding window to save more pages with slight compression
ratio loss.
Or, here __GFP_NOFAIL is actually unnecessary since we could
bail out this if allocation failed for all readahead requests
and only address __read requests__. I have some plan to do
this but it's too close to the next merge window. So I was
once to work this out for Linux 6.9.
Anyway, I'm not saying mempool is not a good idea, but I tend
to reserve memory as less as possible if there are some other
way to mitigate the same workload since reserving memory is
not free (which means 1 MiB memory will be only used for this.)
Even we will do a mempool, I wonder if we could unify pcpubuf
and mempool together to make a better pool.
IMHO, maybe we could do some analysis on your workload first
and think how to do this next.
Thanks,
Gao Xiang