Applying hardware-accelerated SHA3 instruction to optimize sha3_permute
function for s390x arch has an insignificant impact on the performance, I'm
wondering what we can do to take full advantage of those instructions.
Optimizing sha3_absorb seems a good way to go since the s390x-specific
accelerator implies permuting of state bytes and XOR operations but the
downside of implementing this function is handling the block size variants
for each mode, S390x arch supports the standard block sizes so we can
branch for each standard size in the supported modes but should we consider
unexpected block size during the implementation?

regards,
Mamone

On Sun, Aug 29, 2021 at 5:39 PM Maamoun TK <[email protected]>
wrote:

> I added support for the sha1_compress_n function on arm architecture in
> the same branch
> https://git.lysator.liu.se/mamonet/nettle/-/tree/sha1-compress-n
>
> regards,
> Mamone
>
> On Sat, Aug 21, 2021 at 5:22 AM Maamoun TK <[email protected]>
> wrote:
>
>> On Thu, Aug 19, 2021 at 8:48 AM Niels Möller <[email protected]>
>> wrote:
>>
>>> Maamoun TK <[email protected]> writes:
>>>
>>> > What is x86/sha1-compress.nlms? How can I implement nettle_copmress_n
>>> > function for that particular type?
>>>
>>> That's an input file for an obscure "loop mixer" tool, IIRC, it was
>>> written mainly by David Harvey for use with GMP loops. This tool tries
>>> permuting the instructions of an assembly loop, taking dependencies into
>>> account, benchmarks each variant, and tries to find the fastest
>>> instruction sequence. It seems I tried this toool on x86 sha1_compress
>>> back in 2009, on an AMD K7, and it gave a 17% speedup at the time,
>>> according to commit message for 1e757582ac7f8465b213d9761e17c33bd21ca686.
>>>
>>> So you can just ignore this file. And you may want to look at the more
>>> readable version of x86/sha1_compress.asm, just before that commit.
>>>
>>
>> Thanks, I left the nlms files as are and modified x86/sha1_compress.asm
>> to work with the sha1_compress_n function. I've kept the function
>> parameters in the stack since the instructions are able to execute on
>> memory operands and x86 calling convention passes the parameters through
>> the stack, I'm not sure if those parameters are read-only or can be
>> adjustable, TBH I haven't run into x86 32-bit code for 8 years. What I did
>> is reserving fields in the stack for two parameters and adjusting both
>> values in the new locations to keep the original values unmodified.
>>
>> regards,
>> Mamone
>>
>
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to