On Wed, Sep 5, 2018 at 7:32 AM, Eric Biggers <[email protected]> wrote:
> Note that if ever needed there's also still room for optimizing the GF(2^128)
> multiplications further, e.g. multiplying by 'x' and 'x^2' in parallel, or
> maybe
> having a version specialized for 32-bit processors.
Given that this is used to encrypt small buffers only, skipping ahead
seems like it may also be a viable strategy. For example, for the XTS
polynomial x^128 + x^7 + x^2 + x + 1 one can multiply by x^64 very
efficiently with
u128 skip64(u128 x)
{
u128 b64 = (x >> 64);
u128 b63 = (x >> 63) & ~(u128)0x01;
u128 b62 = (x >> 62) & ~(u128)0x03;
u128 b57 = (x >> 57) & ~(u128)0x7f;
return (x << 64) ^ (b64 ^ b63 ^ b62 ^ b57);
}
Calling this twice skips exactly 128 blocks, in which case we can xor
both halves of a 4096-byte sector in parallel.
--
dm-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/dm-devel