On 5/8/26 1:21 AM, ... wrote:

Hi,

Sorry for the late reply. I spent some time reworking the prototype along the
lines suggested in the previous discussion, and also tried to make the
VDO-level measurements more explicit.

There were two main points in the previous feedback. The first was that IAA
support should not add IAA-specific branches in the VDO data path, such as:

     if (iaa_enabled)
             ...
     else
             ...

Instead, the VDO compression path should use the kernel asynchronous
compression API, so that IAA can be plugged in through the existing crypto API
and the VDO code does not need to know about a specific accelerator. The
second point was that the benefit should be shown at the dm-vdo level, because
raw compression-algorithm benchmarks do not necessarily translate into
user-visible VDO throughput or data-efficiency gains.

The attached patch is my current prototype for the first point. It replaces the
direct LZ4 calls in the VDO compression path with a crypto acomp based
compression context. The prototype uses the configured crypto compression
algorithm for VDO compression and decompression, so the IAA path can be reached
through deflate-iaa without adding IAA-specific checks in data-vio.c.

In the future please submit patches (even prototypes) according to the official guidelines: https://docs.kernel.org/process/submitting-patches.html. Patches sent as attachments may get overlooked.

This is still an RFC prototype, not a final patch series. In particular, I am not trying to claim that this patch solves the full long-term metadata problem
for mixing arbitrary compression algorithms in one VDO format. My immediate
goal is narrower: make the IAA implementation follow the generic async
compression API direction, and check whether that is an acceptable basis for
preparing a proper mergeable series.

In general, this approach looks reasonable. However, I do not think we can separate this work from the larger concerns of supporting multiple compression algorithms. From what I can see, IAA acceleration does not support the LZ4 algorithm, only deflate. That means that adding IAA support necessarily means having two different compression algorithms to support. We cannot simply change the algorithm out from under volumes that already have LZ4-compressed data.

The attached patch also contains a small amount of temporary
compression-statistics/debug code. I used it only to identify the VDO
compression entry point and report VDO-level input and payload sizes for the
comparison below. It is not required for the functional part of IAA support,
and I can drop it or keep it behind debug guards when splitting the work into
reviewable patches.

For the VDO-level test, the data set was silesia.tar, a tar archive of the
files from the Silesia corpus. I did not configure the VDO thread counts, so
they used the default startup parameters. For this test I only enabled VDO
compression and disabled deduplication:

     lvcreate --type vdo -n vdo_lv -L 180G -V 200G \
         --compression=y --deduplication=n \
         vdo_vg

I then created and mounted an ext4 filesystem on top of the VDO device, and
used dd on regular files in the mounted filesystem to measure read/write
throughput. The write test copied the Silesia tar archive into the mounted
filesystem:

     dd if=../silesia_raw.tar of=./silesia.tar bs=1M oflag=direct

The read test read the same file back from the mounted ext4 filesystem:

     dd if=./silesia.tar of=/dev/null bs=1M iflag=direct

After the write test, I queried the VDO compression statistics with:

     dmsetup message vdo_vg-vpool0-vpool 0 compress-stats

The temporary compress-stats fields mean:

     calls: number of VDO compression calls counted in this run
     input_bytes: total bytes passed to the compressor by VDO
    payload_bytes: total compressed payload bytes produced by the compressor
     saved_bytes: input_bytes - payload_bytes
     ratio_x100: payload_bytes / input_bytes * 100, truncated to an integer
     ratio: the same payload/input ratio printed as a decimal value

LZ4 result:

     write:
         202+1 records in
         202+1 records out
         211,957,760 bytes copied, 0.357554 s, 593 MB/s
     read:
         202+1 records in
         202+1 records out
         211,957,760 bytes copied, 0.183408 s, 1.2 GB/s
     compress-stats:
         calls 51626
         input_bytes 211460096
         payload_bytes 122705607
         saved_bytes 88754489
         ratio_x100 58
         ratio 0.58x

IAA/deflate result:

     write:
         202+1 records in
         202+1 records out
         211,957,760 bytes copied, 0.228480 s, 928 MB/s
     read:
         202+1 records in
         202+1 records out
         211,957,760 bytes copied, 0.183265 s, 1.2 GB/s
     compress-stats:
         calls 51626
         input_bytes 211460096
         payload_bytes 107585987
         saved_bytes 103874109
         ratio_x100 50
         ratio 0.50x

The two runs had the same number of compression calls and the same VDO
compression input size. On this 211.5 MB VDO compression input, IAA/deflate
stored about 107.6 MB of payload, while LZ4 stored about 122.7 MB. That is
about 15.1 MB less payload for IAA/deflate on this data set.

I have two concerns with this benchmark. First, you have disabled deduplication here. However, for any normal dm-vdo volume, deduplication should be on. The deduplication overhead should be part of the comparison when measuring throughput. (Deduplication is the core feature of dm-vdo. Any user who does not want the deduplication features is most likely better off using a different storage target altogether.)

Second, what does the performance look like when the IAA hardware is not present, or is disabled? It's worth making sure there is no performance regression in that case.

The throughput result also looks more useful than my earlier report. With the
default VDO thread settings, the write-side dd result improved from 593 MB/s
with LZ4 to 928 MB/s with IAA/deflate, while the read-side result stayed
essentially the same at about 1.2 GB/s. This matches the lower-level 4KB
compression tests I ran earlier: IAA showed higher compression bandwidth than LZ4 at that granularity, while its decompression bandwidth was not better than LZ4. So the VDO result is consistent with IAA helping the compressed write path
more than the read/decompression path.

My question is whether this revised approach is a reasonable direction for
adding IAA support to dm-vdo: use the generic async compression API in the VDO
compression path, avoid IAA-specific branches, and then split the prototype
into smaller reviewable patches. If this direction looks acceptable, I can
continue by removing or isolating the temporary statistics code, improving the
benchmark description, and preparing a proper patch series.

The use of the async compression API here seems reasonable. We could switch the existing LZ4 calls to do LZ4 through the regular crypto interface. However, supporting deflate-iaa would need much more than this, as previously mentioned.

There are also a few places where your patch does too much. For instance, there is no need to add a module parameter for the algorithm, since the algorithm cannot be changed yet. (When it is added, that parameter should probably be set on the table line and not with a module parameter anyway.)

Matt

Regards,
Ze Fu
原始邮件
------------------------------------------------------------------------
发件人:Matthew Sakai <[email protected]>
发件时间:2026年5月5日 01:03
收件人:Mikulas Patocka <[email protected]>
抄送:... <[email protected]>, dm-devel <[email protected]>
主题:Re: dm vdo: add optional Intel IAA-backed compression support




On 5/4/26 12:01 PM, Mikulas Patocka wrote:
 >
 >
 > On Wed, 29 Apr 2026, Matthew Sakai wrote:
 >
 >>
 >>
 >> On 4/29/26 11:24 AM, Mikulas Patocka wrote:
 >>> Hi
 >>>
 >>> I think the best way how to support it would be to modify the VDO target
 >>> to use the asynchronous compression API (so that it could use arbitrary
 >>> algorithms). Then, the support for IAA could be plugged in easily with
 >>> little or no extra code.
 >>
 >> I agree that this is the right approach, but supporting arbitrary 
compression
>> algorithms will require some significant changes on its own. dm- vdo currently
 >> has nowhere to store information about what algorithm is used for which
 >> blocks, so it would require reworking the metadata. (We would probably store
 >
 > You can store the algorithm in the superblock and use it for all blocks on
 > the device. This would be not as flexible as per-block algorithm
 > specification, but it should be relatively easy to implement - and it
 > would allow you to evaluate whether different algorithms improve
 > performance or compression ratio.

We could do that, certainly. It means picking the compression algorithm
at format time, which also means no existing vdo volumes could take
advantage of it. But we'd still need an upgrade for existing volumes to
handle the new superblock field. I think managing the compression
algorithm per-block is not that much more work, especially since I've
already written much of it.

 >> the extra compression information in the compression block header.) This
 >> metadata rework in turn will require some effort to make sure we can 
continue
>> to support existing users who will want to continue to use dm- vdo volumes in
 >> the current format.
 >
 > Mikulas
 >




Reply via email to