Hi,

On 8/5/25 07:59, Eric Biggers wrote:

md5sum uses the kernel's MD5 code:

What?  That's crazy.  Userspace MD5 code would be faster and more
reliable.  No need to make syscalls, transfer data to and from the
kernel, have an external dependency, etc.  Is this the coreutils md5sum?
We need to get this reported and fixed.

The userspace API allows zero-copy transfers from userspace, and AFAIK also directly operating on files without ever transferring the data to userspace (so we save one copy).

Userspace requests are also where the asynchronous hardware offload units get to chomp on large blocks of data while the CPU is doing something else:

$ time dd if=test.bin of=/dev/zero bs=1G     # warm up caches
real    0m1.541s
user    0m0.000s
sys     0m0.732s

$ time gzip -9 <test.bin >test.bin.gz        # compress with the CPU
real    2m57.789s
user    2m55.986s
sys     0m1.508s

$ time ./gzfht_test test.bin                 # compress with NEST unit
real    0m3.207s
user    0m0.584s
sys     0m2.487s

$ time gzip -d <test.bin.nx.gz >test.bin.nx  # decompress with CPU
real    1m0.103s
user    0m57.990s
sys     0m1.878s

$ time ./gunz_test test.bin.gz               # decompress with NEST unit
real    0m2.722s
user    0m0.200s
sys     0m1.872s

That's why I'm objecting to measuring the general usefulness of hardware crypto units by the standards of fscrypt, which has an artificial limitation of never submitting blocks larger than 4kB: there are other use cases that don't have that limitation, and where the overhead is negligible because it is incurred only once for a few gigabytes of data.

That's why I suggested changing from a priority field to "speed" and "overhead" fields, and calculate priority for each application as (size/speed+overhead) -- smallest number wins, size is what the application expects to use as the typical request size (which for fscrypt and IPsec is on the small side, so it would always select the CPU unless there was a low-overhead offload engine available)

This probably needs some adjustment to allow selecting a low-power implementation (e.g. on mobile, I'd want to use offloading for fscrypt even if it is slower), and model request batching which reduces the overhead in a busy system, but it should be a good start.

   Simon

Reply via email to