Il 23/08/2018 11:34, Marco Borsari via fpc-pascal ha scritto:

It would be for the Wirth optimization in the access of an array,
when the index is 0 or 1, allowing the elimination of a multiplication.

I'm afraid that this sort of optimization is rather outdated, and doesn't take into account a number of factors which make modern processors behave quite differently from old times ones.

First of all, saving a multiplication was important at the times when even integer multiplication was very costly in terms of machine cycles. Today one must carefully check if there's really any saving in replacing a multiplication (which is nowadays very fast) in just a couple of cases (index 0 and 1) with a number of checks and jumps for *all* cases.
Moreover, any compiler will align your array elements on 32bits boundaries (or 64bits if you're in a 64bits platform) for better efficiency, so that your multiplication will be a multiplication by a power of two, which all the modern compilers translate into a much faster shl. No doubt that adding checks and jumps to save a shl doesn't make much sense.

The second important factor to take into account is that modern processors do not execute instructions taken directly from the slower main memory, but they fetch them from the much faster cache. A jump instruction, if the target isn't very near, will generate a cache flush, thus dramatically increasing the execution time. That's why modern compilers try to avoid a branch table whenever possible. In the example you mentioned earlier (http://www.mindfruit.co.uk/2012/01/switch-blocks-jump-tables.html) you can read that gcc translates a switch (= fpc case) construct into a branch table with -O1 (minimal optimization), but with higher optimization (-O3) it avoids the branch table and translates the construct into a sequel of conditional jumps.
In any case for the above reason (avoid cache flushing with a jump target too far away) the code that I had proposed (branch table in the data area) is preferable to the one of your last attempt (branch table in the code area).

You may easily verify what solution provides the best efficiency in your platform, by performing a high number of iterations over your code, with random idx's in the range of your array size, and getting the total execution time (sort of: time your_prog). Keeping in mind that on a different platform (different cache size) it could behave quite differently.

Giuliano
-- 
Do not do to others as you would have them do to you.They might have different tastes.

_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to