On 4-1-2022 17:15, J. Gareth Moreton via fpc-devel wrote:
I neglected to include -Cpcoreavx, that was my bad. I'll try again.
According to Intel® 64 and IA-32 Architectures Software Developer’s
Manual, Vol 2B, Page 4-391. The zero flag is set if the source is
zero, and cleared otherwise. Regarding an undefined result, I got
confused with the BSF and BSR commands, sorry. I guess I was more
tired than I thought! POPCNT returns zero for a zero input.
Ok, that's what I thought.
I played a bit by adding code alignments to loops in the SSE code, but
it only seems to slow the core loop rather than accelerate it (align
before the branch location and/or branch target)
Did you have any thoughts about moving up the NOT instruction ?
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel