On Mon, Jun 15, 2026 at 11:14:56AM +0000, Leonid Ravich wrote: > The series adds a per-request "data unit size" to the skcipher API > so a caller can submit several data units (typically 512..4096-byte > sectors) sharing one starting IV in a single request. Algorithms > derive each data unit's IV from the caller-supplied IV by treating > it as a 128-bit little-endian counter and adding the data-unit > index, matching the layout produced by dm-crypt's plain64 IV mode > and by typical inline-encryption hardware. > > This mirrors the data_unit_size concept already exposed by > struct blk_crypto_config for inline encryption. > > The first user is dm-crypt, which today issues one skcipher request > per sector and so pays a per-sector cost in request allocation, > callback dispatch, completion handling, and scatterlist setup. > > Proof-of-concept performance numbers from the RFC reply [1]: +19% > throughput / -40% CPU on a single-core arm64 system with a hardware > XTS-AES-256 accelerator running fio 4 KiB sequential writes through > dm-crypt, when an out-of-tree arm64 xts driver advertises > CRYPTO_ALG_SKCIPHER_NATIVE_MULTI_DU. This series itself does not > include arch enablement; the fast path is opt-in per driver, the > slow path is universal via the auto-splitter. > > The native fast path amortises both per-sector dispatch and per-sector > crypto setup across a bio - the measured win above, on an engine that > offloads the AES compute. The auto-splitter is for correctness and > reach: any consumer can set data_unit_size and get correct output with > the per-request allocation/callback/completion cost removed, but it > still issues one alg->encrypt per data unit, so on a software cipher it > saves only dispatch overhead (no throughput figure claimed - that is > hardware- and workload-dependent). What it guarantees unconditionally > is byte-identical output (Verification below) at O(entries + units), > walking the scatterlists with a pair of struct scatter_walk cursors > rather than rescanning from the head per unit.
So in other words, this series slows down dm-crypt and crypto_skcipher for everyone to optimize for an out-of-tree driver. And there's also no benchmark showing that your driver is even worth it over just using the CPU. - Eric
