On Sun, Dec 11, 2016 at 7:48 PM, Jason A. Donenfeld <ja...@zx2c4.com> wrote:
> +       switch (left) {
> +               case 7: b |= ((u64)data[6]) << 48;
> +               case 6: b |= ((u64)data[5]) << 40;
> +               case 5: b |= ((u64)data[4]) << 32;
> +               case 4: b |= ((u64)data[3]) << 24;
> +               case 3: b |= ((u64)data[2]) << 16;
> +               case 2: b |= ((u64)data[1]) <<  8;
> +               case 1: b |= ((u64)data[0]); break;
> +               case 0: break;
> +       }

The above is extremely inefficient. Considering that most kernel data
would be expected to be smallish, that matters (ie the usual benchmark
would not be about hashing megabytes of data, but instead millions of
hashes of small data).

I think this could be rewritten (at least for 64-bit architectures) as

    #ifdef CONFIG_DCACHE_WORD_ACCESS

        if (left)
                b |= le64_to_cpu(load_unaligned_zeropad(data) &
bytemask_from_count(left));

    #else

        .. do the duff's device thing with the switch() ..

    #endif

which should give you basically perfect code generation (ie a single
64-bit load and a byte mask).

Totally untested, just looking at the code and trying to make sense of it.

... and obviously, it requires an actual high-performance use-case to
make any difference.

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to