Hi everyone, So I've noticed that when certain blocks of code are aligned (usually to a 16-byte boundary), the bytes in between are set to a combination of 90H, 66 90H, 66 66 90H or 66 66 66 90H. This is fine and all, but for any sequences larger than 4 bytes, it requires up to 4 instructions, which might start incurring a small performance penalty in the instruction queue (although given the size of the queue is generally over 50 instructions, this is negligible at best).
However, reading the Intel instruction reference, they recommend the following sequences: 1 byte - 90H 2 bytes - 66 90H 3 bytes - 0F 1F 00H 4 bytes - 0F 1F 40 00H (AMD still recommends 66 66 66 90H) 5 bytes - 0F 1F 44 00 00H 6 bytes - 66 0F 1F 44 00 00H 7 bytes - 0F 1F 80 00 00 00 00H 8 bytes - 0F 1F 84 00 00 00 00 00H 9 bytes - 66 0F 1F 84 00 00 00 00 00H Now, they do warn that 0F 1FH will trigger a SIGILL if the processor doesn't support it (unlike 90H, which is an alias of "xchg %ax, %ax"), however it has been supported since the Pentium Pro, and is all but guaranteed to be supported on AMD64 because of the requirements of features like SSE2 that arrived in the Pentium III era. Is it worth updating the longer byte sequences to use the 5-to-9-byte sequences for a very minor performance boost and reduction in file entropy (the 00s will be easier to compress since they generally appear more frequently in the entirety of the binary)? Yours faithfully, J. Gareth Moreton _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel