On Mon, Jun 22, 2026 at 06:23:28PM +0000, Eric Biggers wrote:
> I don't think there's a path forward without an in-tree user that's
> shown to be worthwhile over just using the acceleration built directly
> into the CPU.  As well as confirmation of no regression to existing
> users, including in cases where the inline sg list can't be used.

Agreed. Proposing a smaller v5 that meets the no-regression bar now and
leaves "beats the CPU" to a follow-up with a real in-tree user.

dm-crypt submits one request per contiguous bio segment (a single
bio_vec) with data_unit_size = sector_size, instead of one per sector.
E.g. default sector_size 512 with a 4 KiB bio_vec: one request of 8
data units, which the fallback splitter walks as 8 per-sector calls --
dm-crypt no longer open-codes the per-data-unit loop itself.

  - Uses only the existing inline sg_in[0]/sg_out[0] entry. No per-bio
    scatterlist, no kmalloc -- the "inline sg list can't be used" case
    doesn't exist here, so there's nothing to regress.
  - For a non-native algorithm the core auto-splits into the same
    per-sector calls dm-crypt makes today: identical output and cost.
    This is what Herbert predicted -- the per-unit indirect call just
    moves from the caller into the API; the fallback is no slower.

So it stands on no-regression alone, with no software throughput claim.
What it adds is the interface a native one-pass driver needs. I'd land
that now and bring a native offload user + numbers as the follow-up,
rather than block the interface on the driver.

Acceptable? If so I'll respin v5 as the minimal version.

Thanks,
Leonid

Reply via email to