Re: Branch (was: Performance question - adding)

David Crayford Mon, 17 Feb 2014 06:43:36 -0800

On 17/02/2014 10:25 PM, Paul Gilmartin wrote:

Then you get to factor in how much "readability" is worth to you.

>

Macros are your friend.  But does providing "readability" at the
programming interface level make such a macro unpleasantly
verbose internally?

Unless your desperately need highly optimized code readability is veryimportant when writing assembler code. I suppose that's definitely thecase for most vendors.

The alternative is to use an optimizing compiler which go out of theirway to remove branches. The following code snippet is a simple routineto convert a flag byte into a string of binary 1s and 0s. The optimizerunrolled the loop and used those fancy new load on conditioninstructions to remove all branches.

I compiled two versions, one with loop unrolling and one without usingthe #pragma nounroll directive. The unrolled version was x3 faster! Nowthat's impressive.


 *      static char buffer[CHAR_BIT + 1];
 *      int i;
 *      int numBits = CHAR_BIT;
 *
 *      for ( i = 0; numBits--; i++ )
           LR       r3,r1
 *      {
 *          buffer[i] = ( c & 0x80 ) ? '1' : '0';
           LA       r0,240
           NILF     r1,F'128'
           LA       r8,241
           LA       r9,241
           NILF     r3,F'255'
           LA       r10,241
           LTR      r1,r1
           SLLK     r1,r3,1
           LOCRE    r8,r0
           LR       r3,r1
           NILF     r1,F'128'
           LA       r11,241
           NILF     r3,F'255'
           STC      r8,buffer[]0(,r5,9)
           LA       r2,241
           LTR      r1,r1
           SLL      r3,1
           LR       r1,r3
           LOCRE    r9,r0
           NILF     r3,F'128'
           STC      r9,buffer[]0(,r5,10)
           NILF     r1,F'255'
           LTR      r3,r3
           SLL      r1,1
           LOCRE    r10,r0
           STC      r10,buffer[]0(,r5,11)
 *          c <<= 1;
           LR       r3,r1
           NILF     r1,F'128'
           NILF     r3,F'255'
           LTR      r1,r1
           SLLK     r1,r3,1
           LR       r3,r1
           LOCRE    r11,r0
           NILF     r3,F'255'
           STC      r11,buffer[]0(,r5,12)
           LA       r8,241
           SLLK     r9,r3,1
           NILF     r1,F'128'
           LR       r10,r9
           LA       r11,241
           LTR      r1,r1
           LOCRE    r8,r0
           NILF     r10,F'255'
           NILF     r9,F'128'
           STC      r8,buffer[]0(,r5,13)
           LTR      r9,r9
           SLLK     r8,r10,1
           LOCRE    r11,r0
           LR       r9,r8
           NILF     r8,F'128'
           LA       r1,241
           NILF     r9,F'255'
           STC      r11,buffer[]0(,r5,14)
 *      }
 *
 *      buffer[i] = '\0';
 *
 *      return buffer;
           LA       r3,buffer(,r5,9)
           LTR      r8,r8
           SLLK     r8,r9,1
           LOCRE    r1,r0
           NILF     r8,F'128'
           STC      r1,buffer[]0(,r5,15)
           LTR      r8,r8
           LOCRE    r2,r0
           STC      r2,buffer[]0(,r5,16)
           MVI      buffer[]0(r5,17),0
 *  }



----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Branch (was: Performance question - adding)

Reply via email to