Hi everyone,

I'm looking at more optimisations under x86_64 that use the BMI instructions.  I've come across one situation that I need clarity on... how are SHL and SHR instructions handled if the shift value exceeds the word size?

For example, say I have the code "power2 := 1 shl index;"... assuming everything here is a LongWord (32-bit unsigned integer), what happens if index is 32 or higher?  Logic would state that the result should be zero, since all of the bits are essentially shifted out of the register, but under x86, the SHL and SHR instructions effectively do the following:

power2 := 1 shi (index mod 32);

("mod 64" for 64-bit registers)

So in this case, 1 shl 32 will return 1.  This may cause problems with other platforms where the index isn't masked like this, and I've noticed code in the compiler that looks out for this, which is especially important for bitmasks since "(1 shl 32) - 1" won't return $FFFFFFFF as expected, but 0 instead.  There are some problems though because the BZHI instruction, which is otherwise great for producing such masks, doesn't modify the input if the index is equal to or greater than the word size (it does set the carry flag though) and so the equivalent of "(1 shl 32) - 1" will return $FFFFFFFF.

From a cross-platform perspective, how are too-large indices generally handled to ensure consistent cross-platform behaviour? I know ARM and AArch64 also reduces the index modulo the word size, but I don't know if this holds for other platforms.  If one assumes (index mod word_size), it will make any kind of BZHI optimisation a little janky (any kind of optimisation that doesn't account for the case of a too-large index would only be valid under -O4 rules).

Kit

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to