On Mon, Jun 22, 2026 at 07:10:44AM +0000, Leonid Ravich wrote: > On Mon, Jun 15, 2026 at 03:53:17PM -0700, Eric Biggers wrote: > > So in other words, this series slows down dm-crypt and crypto_skcipher > > for everyone to optimize for an out-of-tree driver. And there's also no > > benchmark showing that your driver is even worth it over just using the > > CPU. > > I measured on arm64 (Graviton3, dm-crypt + xts-aes-ce, RAM-backed, > fixed CPU freq): > > - 4 KiB random write, 512-byte sectors: v4 as posted regressed ~5%. > Root cause (ftrace): a per-bio kmalloc_array() for the scatterlists, > where the per-sector path uses dm-crypt's inline sg_in[]/sg_out[]. > > - Reusing the inline arrays when the segment count fits (heap only for > larger bios) removes the regression, back to parity. This will be in > the dm-crypt patch for v5. > > So the software path is neutral after the fix, not slower. No software > throughput win > either: the auto-splitter still calls alg->encrypt per data unit. The win > is for a consumer that takes the whole request in one pass, a HW engine, > or any async offload engine that pays a fixed per-request cost, > it currently pays once per sector instead of once per bio. > > I'd rather not over-complicate the patches until there's a general > ack on the direction: per-request data_unit_size + auto-split, > enabling one-pass consumers, neutral for everyone else. Is that direction > acceptable? If so I'll respin v5.
I don't think there's a path forward without an in-tree user that's shown to be worthwhile over just using the acceleration built directly into the CPU. As well as confirmation of no regression to existing users, including in cases where the inline sg list can't be used. - Eric
