On 4-1-2022 17:15, J. Gareth Moreton via fpc-devel wrote:
I neglected to include -Cpcoreavx, that was my bad.  I'll try again.

According to Intel® 64 and IA-32 Architectures Software Developer’s Manual, Vol 2B, Page 4-391.  The zero flag is set if the source is zero, and cleared otherwise.  Regarding an undefined result, I got confused with the BSF and BSR commands, sorry.  I guess I was more tired than I thought!  POPCNT returns zero for a zero input.

Ok, that's what I thought.

I played a bit by adding code alignments to loops in the SSE code, but it only seems to slow the core loop rather than accelerate it (align before the branch location and/or branch target)

Did you have any thoughts about moving up the NOT instruction ?
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to