On 2025/7/16 12:54, Qu Wenruo wrote:
在 2025/7/16 10:46, Gao Xiang 写道:
...
There's some discrepancy between filesystems whether you need scratch
space for decompression. Some filesystems read the compressed data into
the pagecache and decompress in-place, while other filesystems read the
compressed data into scratch pages and decompress into the page cache.
Btrfs goes the scratch pages way. Decompression in-place looks a little tricky
to me. E.g. what if there is only one compressed page, and it decompressed to 4
pages.
Decompression in-place mainly optimizes full decompression (so that CPU
cache line won't be polluted by temporary buffers either), in fact,
EROFS supports the hybird way.
Won't the plaintext over-write the compressed data halfway?
Personally I'm very familiar with LZ4, LZMA, and DEFLATE
algorithm internals, and I also have experience to build LZMA,
DEFLATE compressors.
It's totally workable for LZ4, in short it will read the compressed
data at the end of the decompressed buffers, and the proper margin
can make this almost always succeed.
I guess that's why btrfs can not go that way.
Due to data COW, we're totally possible to hit a case that we only want to read
out one single plaintext block from a compressed data extent (the compressed
size can even be larger than one block).
In that case such in-place decompression will definitely not work.
Ok, I think it's mainly due to btrfs compression design. Another point
is that decompression inplace can also be used for multi-shot interfaces
(as you said, "swapping input/ output buffer when one of them is full")
like deflate, lzma and zstd. Because you can know when the decompressed
buffers and compressed buffers are overlapped since APIs are multi-shot,
and only copy the overlapped compressed data to some additional temprary
buffers (and they can be shared among multiple compressed extents).
It has less overhead than allocating temporary buffers to keep compressed
data during the whole I/O process (again, because it just uses very small
number buffers during decompression process), especially for slow (even
network) storage devices.
I do understand Btrfs may not consider this because of different target
users, but one of EROFS main use cases is low overhead decompression
under the memory pressure (maybe + cheap storage), LZ4 + inplace
decompression is useful.
Anyway, I'm not advocating inplace decompression in any case. I think
unlike plain text, encoded data has various approaches to organize
on disk and utilize page cache. Due to different on-disk design and
target users, there will be different usage mode.
As for EROFS, we already natively supports compressed large folios
since 6.11, and order-0 folio is always our use cases, so I don't
think this will give extra benefits to users.
[...]
All the decompression/compression routines all support swapping input/ output
buffer when one of them is full.
So kmap_local() is completely feasible.
I think one of the btrfs supported algorithm LZO is not,
It is, the tricky part is btrfs is implementing its own TLV structure for LZO
compression.
And btrfs does extra padding to ensure no TLV (compressed data + header)
structure will cross block boundary.
So btrfs LZO compression is still able to swap out input/output halfway, mostly
due to the btrfs' specific design.
Ok, it seems much like a btrfs-specific design, because it's much
like per-block compression for LZO instead, and it will increase
the compressed size, I know btrfs may not care, but it's not the
EROFS case anyway.
Thanks,
Gao Xiang
Thanks,
Qu
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel