True, that's fair.  Just having this feature for the sake of compiling the "hash" package more efficiently is... well... favouritism?!  Or just a feature that has a single purpose.

Kit

On 16/09/2023 17:45, Florian Klämpfl via fpc-devel wrote:

Am 16.09.2023 um 17:45 schrieb J. Gareth Moreton via fpc-devel 
<fpc-devel@lists.freepascal.org>:

I missed this post - thanks Florian!

Indeed, SHA-1 is deprecated at least as far as being a cryptographic algorithm 
is concerned, but it still has some uses in data verification in a similar vein 
to MD5.  I know git uses it internally so server branches can't be corrupted.

I have probably spent too much time on SHA-1 already - its awkward size of 160 
bits has always irked me... not a clean power of two!

Speaking of the Intel SHA instructions, can I introduce a merge request that adds 
"CPUX86_HAS_SHA" as a feature flag?
You can, but in this case it is imo more useful to check at run time by using 
the cpu unit as the instructions are part of a procedure not generated by the 
compiler.

I know to add it for "cpu_zen" and later, but I'm not sure what the equivalent Intel 
processor is... is "cpu_core_avx2" okay or does there need to be a new one?

Kit

On 15/09/2023 22:48, Florian Klämpfl via fpc-devel wrote:
Am 16.09.23 um 15:13 schrieb J. Gareth Moreton via fpc-devel:
Hi everyone,

So this past week I've been building on Rika's work by adding an assembly 
version of SHA-1 for x86_64 to complement Rika's i386 version. So far I've 
successfully made a version that runs twice as fast as the Pascal code.  I 
hoped to go even faster by making use of the SSE2 instruction set, but 
currently the end result is slower even though computing the common parts of 4 
rounds simultaneously should be much faster.  This occurs even when I forgo 
writing to the stack and keep pretty much all of the state within registers.  
Preliminary investigation suggests that the slowdown comes from using MOVD/Q to 
transfer data between the XMM registers and general-purpose registers, since 
they are different parts of the CPU.  I'm still amazed it causes this much 
latency though.

I'll keep investigating and seeing if I can squeeze out more performance, but 
otherwise I may just have to fall back on a non-SIMD-optimised implementation.
As SHA-1 is basically deprecated and not recommended to be used anymore, I 
wouldn't spend too much into this. Besides this, for SHA-1 and SHA-256, it 
might be even more useful to use the SHA CPU extensions if available. While 
they are only introduced in Ice Lake and Zen, they will get more and more 
available in the future.
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to