On Mon, Jun 22, 2026 at 06:23:28PM +0000, Eric Biggers wrote:
> I don't think there's a path forward without an in-tree user that's
> shown to be worthwhile over just using the acceleration built directly
> into the CPU. As well as confirmation of no regression to existing
> users, including in cases where the inline sg list can't be used.
Agreed. Proposing a smaller v5 that meets the no-regression bar now and
leaves "beats the CPU" to a follow-up with a real in-tree user.
dm-crypt submits one request per contiguous bio segment (a single
bio_vec) with data_unit_size = sector_size, instead of one per sector.
E.g. default sector_size 512 with a 4 KiB bio_vec: one request of 8
data units, which the fallback splitter walks as 8 per-sector calls --
dm-crypt no longer open-codes the per-data-unit loop itself.
- Uses only the existing inline sg_in[0]/sg_out[0] entry. No per-bio
scatterlist, no kmalloc -- the "inline sg list can't be used" case
doesn't exist here, so there's nothing to regress.
- For a non-native algorithm the core auto-splits into the same
per-sector calls dm-crypt makes today: identical output and cost.
This is what Herbert predicted -- the per-unit indirect call just
moves from the caller into the API; the fallback is no slower.
So it stands on no-regression alone, with no software throughput claim.
What it adds is the interface a native one-pass driver needs. I'd land
that now and bring a native offload user + numbers as the follow-up,
rather than block the interface on the driver.
Acceptable? If so I'll respin v5 as the minimal version.
Thanks,
Leonid