Sometimes the compilers are better than humans... for example multiply is
expensive, and divide is even more expensive but the compiler can handle
constants
*Multiply *
x * 10 =x * 8 + x * 2
 = (x shift left 3 bits) + (x shift left 1 bit)
shifts can be done in parallel then added  (two clock ticks)

*Divide*
x/3 = x * 341 /1024
= x * 341 >> 10 bits
17 / 3 = 17 * 341 /1024 = 5  (2+ 1  clock ticks)

These numbers (341, 1024) are not unique

1024/3 = 341: 1/3 = 341/1024
65536/3 = 21845 :  1/3 = 21845/65536

I thought this amazing - multiply and divide without using multiply or
divide instructions.

Colin


On Sun, 24 Aug 2025 at 11:39, Jonathan Scott <
00001b5498fc732f-dmarc-requ...@listserv.uga.edu> wrote:

> I totally agree that in most cases performance is achieved by using the
> right design and algorithms.  Simplicity and reliability of code is also
> very important, and for code which is not performance-critical there is
> little point in attempting local optimization at the expense of
> simplicity.  It is usually only for extremely intensively executed code
> (innermost loops) where any sort of local optimization is worth the
> effort.  It used to be that reordering sequences of instructions to avoid
> address generation interlocks and other pipeline blocks could achieve
> significant improvements, but recent IBM Z processors now handle much of
> that automatically.  Keeping things in registers (including vector
> registers) to avoid storage access is still useful, and some newer
> instructions can help simplify code as well as improving performance, for
> example the "interlocked-access facility 1" makes it simple and fast to use
> ASI to update shared counters.  The IBM Z hardware people have always said
> that you should use obvious standard sequences of code as those will be the
> ones that they are trying to optimise, so for example exclusive-or of a
> storage location with itself is typically interpreted as an instruction to
> store zeroes in that storage, and the standard MVC with offset of 1 byte is
> interpreted as an instruction to fill storage with a pad byte.  There are a
> few performance oddities that are worth noting at the algorithm level, for
> example if you repeatedly look at the same offset in many 4K pages you may
> get performance degradation because there are only a limited number of
> cache lines for each 256-byte range, so it may be better to maintain a
> separate compact index containing the same information.
>
> And comments are essential not just for future readers of the code, but
> also to ensure that the person writing the code can explain what they are
> doing, ensuring they have a full understanding.  I generally wrote the
> block comments before I wrote the code.  Back in the late 1970s I wrote a
> very concise piece of bit-twiddling code to set VSAM options which was
> particularly tricky to understand, despite detailed comments, and after
> finding myself rechecking it several times over the years, I added a
> comment saying "This code is correct.  Do not waste time checking it.  If
> there is a bug, it is somewhere else!".  Some years later, long after I had
> left that company, I received a note thanking me for how much time that
> comment had saved!
>
> Jonathan Scott
>

Reply via email to