Re: [f2fs-dev] Compressed files & the page cache

Gao Xiang Tue, 15 Jul 2025 18:17:29 -0700

...


There's some discrepancy between filesystems whether you need scratch
space for decompression.  Some filesystems read the compressed data into
the pagecache and decompress in-place, while other filesystems read the
compressed data into scratch pages and decompress into the page cache.


Btrfs goes the scratch pages way. Decompression in-place looks a little tricky 
to me. E.g. what if there is only one compressed page, and it decompressed to 4 
pages.


Decompression in-place mainly optimizes full decompression (so that CPU
cache line won't be polluted by temporary buffers either), in fact,
EROFS supports the hybird way.


Won't the plaintext over-write the compressed data halfway?


Personally I'm very familiar with LZ4, LZMA, and DEFLATE
algorithm internals, and I also have experience to build LZMA,
DEFLATE compressors.

It's totally workable for LZ4, in short it will read the compressed
data at the end of the decompressed buffers, and the proper margin
can make this almost always succeed.  In practice, many Android
devices already use EROFS for almost 7 years and it works very well
to reduce extra memory overhead and help overall runtime performance.

In short, I don't think EROFS will change since it's already
optimal and gaining more and more users.


There also seems to be some discrepancy between filesystems whether the
decompression involves vmap() of all the memory allocated or whether the
decompression routines can handle doing kmap_local() on individual pages.


Btrfs is the later case.

All the decompression/compression routines all support swapping input/output 
buffer when one of them is full.
So kmap_local() is completely feasible.


I think one of the btrfs supported algorithm LZO is not, because the
fastest LZ77-family algorithms like LZ4, LZO just operates on virtual
consecutive buffers and treat the decompressed buffer as LZ77 sliding
window.

So that either you need to allocate another temporary consecutive
buffer (I believe that is what btrfs does) or use vmap() approach,
EROFS is interested in the vmap() one.

Thanks,
Gao Xiang


Thanks,
Qu


So, my proposal is that filesystems tell the page cache that their minimum
folio size is the compression block size.  That seems to be around 64k,
so not an unreasonable minimum allocation size.  That removes all the
extra code in filesystems to allocate extra memory in the page cache.
It means we don't attempt to track dirtiness at a sub-folio granularity
(there's no point, we have to write back the entire compressed bock
at once).  We also get a single virtually contiguous block ... if you're
willing to ditch HIGHMEM support.  Or there's a proposal to introduce a
vmap_file() which would give us a virtually contiguous chunk of memory
(and could be trivially turned into a noop for the case of trying to
vmap a single large folio).




_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [f2fs-dev] Compressed files & the page cache

Reply via email to