Hi,
As part of my GSoC 2026 proposal to introduce Multi-Threaded
Decompression Support in fsck.erofs, I am submitting this two-patch
series which establishes the core workqueue offloading infrastructure.
Baseline profiling of fsck.erofs extracting LZ4HC 4K pclusters showed
the main thread bottlenecking on synchronous VFS writes while blocking
decompression tasks. This series decouples the compute payload into the
existing erofs_workqueue.
- Patch 1 introduces the baseline producer-consumer logic. To avoid
massive futex scheduling overhead on tiny 4K clusters, it implements
a batching context that groups sequential pclusters into a single
erofs_work unit. Buffer memory ownership is strictly delegated to
the workers using calloc() to prevent garbage-byte leaks.
- Patch 2 implements dynamic, algorithm-aware batching. Fast algorithms
(LZ4) are permitted to utilize the maximum batch size (32 pclusters)
to hide scheduling latency, whereas compute-heavy algorithms (LZMA)
trigger much smaller batches (8 pclusters) to prevent memory bloat
and keep the thread pool continuously fed.
The implementation has been verified to produce bit-perfect extractions
against heavily packed LZ4HC test images.
Nithurshen (2):
fsck.erofs: introduce multi-threaded decompression with static
batching
fsck.erofs: implement dynamic pcluster batching based on algorithm
complexity
fsck/main.c | 234 +++++++++++++++++----------------------
include/erofs/internal.h | 18 ++-
lib/data.c | 206 ++++++++++++++++++++++++----------
3 files changed, 268 insertions(+), 190 deletions(-)
--
2.52.0