Hi everyone,

So I've noticed that when certain blocks of code are aligned (usually to a 
16-byte boundary), the bytes in 
between are set to a combination of 90H, 66 90H, 66 66 90H or 66 66 66 90H.  
This is fine and all, but for 
any sequences larger than 4 bytes, it requires up to 4 instructions, which 
might start incurring a small 
performance penalty in the instruction queue (although given the size of the 
queue is generally over 50 
instructions, this is negligible at best).

However, reading the Intel instruction reference, they recommend the following 
sequences:

1 byte - 90H
2 bytes - 66 90H
3 bytes - 0F 1F 00H
4 bytes - 0F 1F 40 00H (AMD still recommends 66 66 66 90H)
5 bytes - 0F 1F 44 00 00H
6 bytes - 66 0F 1F 44 00 00H
7 bytes - 0F 1F 80 00 00 00 00H
8 bytes - 0F 1F 84 00 00 00 00 00H
9 bytes - 66 0F 1F 84 00 00 00 00 00H

Now, they do warn that 0F 1FH will trigger a SIGILL if the processor doesn't 
support it (unlike 90H, which 
is an alias of "xchg %ax, %ax"), however it has been supported since the 
Pentium Pro, and is all but 
guaranteed to be supported on AMD64 because of the requirements of features 
like SSE2 that arrived in the 
Pentium III era.  Is it worth updating the longer byte sequences to use the 
5-to-9-byte sequences for a 
very minor performance boost and reduction in file entropy (the 00s will be 
easier to compress since they 
generally appear more frequently in the entirety of the binary)?

Yours faithfully,

J. Gareth Moreton
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to