On Wed 13/06/18 20:28 , Florian Klämpfl [email protected] sent: > Am 12.06.2018 um 14:42 schrieb J. Gareth Moreton: > > > Hi everyone, > > > > > > Sorry to flood the mailing list again with more > ideas and experiments. > > > > > I would like to propose introducing a new pair > of in-built functions for the compiler. > > > > > function FLog2(X: Cardinal): Cardinal; > > > function CLog2(X: Cardinal): Cardinal; > > > > > > FLog2 returns the base-2 logarithm of X, rounded > down, while CLog2 returns the base-2 logarithm of X, rounded up. > > > > > To give examples where these functions could be > useful, FLog2(X) + 1 indicates the minimum number of bits required to > > store X as an unsigned integer, while CLog2(X) > is equal to the maximum number of iterations required for a binary search > > of an X-sized list. > > > > > > Why should they be in-built though? With the > binary search example, the size of the list is sometimes known at compile > > time, hence is a constant - therefore, its > logarithm is also a constant. By pre- calculating the logarithm using > the > > in-built function, it can be used to aid > optimization such as loop unrolling. It also makes them easier to > inline, > > where FLog2(X) on x86_64-win64 translates to a > single line of assembly language: BSR EAX, ECX (unless X is zero, in > > which case ZF is set and EAX is undefined). > > > > > > If there had to be a slight nuance though, it's > what happens if X = 0, since log(0) = - oo, which cannot be stored as an > > integer and may cause problems with the > compiler. I would propose it return 0 as a special case, as this will > fix most > > problems when it comes to loops and storage, and > also ensure that FLog2 and CLog2 are "entire functions". To > give an > > optimised example of FLog(2) in x86-64 assembly > language: > > > > > XOR EDX, EDX > > > BSR EAX, ECX // ZF is set if ECX is zero > > > CMOVZ EAX, EDX // Zero (EDX) written to result > if ZF is set. > > > > > Some kind of deep optimization could be used if > the input is known to be non-zero and remove the instructions either > > side of BSR. > > > > > > > Here again: I would first try find a good pascal implementation of the two > functions, then check, what FPC produces, > actually, FPC has already a BsrDWord function which is very similar to > FLog2, though it returns 255 for 0. And then I > would propose to improve the optimizer to generate code close to hand coded > assembler. > > > __________________________________________ _____ > > fpc-devel maillist - fpc- [email protected] > http://lists.freepascal.org/cgi- bin/mailman/listinfo/fpc-devel > > > >
Aah, I didn't know about that function - thank you. I have looked up algorithms for floor(log2(x)), and the best ones I've found use a heap of bit shifts and other logical operations, and a few conditional checks, I think, which is all somewhat slower than the 10-cycle BSR instruction. Granted, I agree there should be a good Pascal implementation so it is portable. For CLog2(x) the only solution I've found currently is FLog2(x-1)+1 while returning 0 if x is 0 or 1. Mind you, an extra addition, subtraction and a conditional check isn't bad. Gareth P.S. BsrDWord is effectively FLog2, with the 255 return value standing in for -oo. _______________________________________________ fpc-devel maillist - [email protected] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
