On 30/04/2013 4:12 AM, Paul Gilmartin wrote:
On Mon, 29 Apr 2013 13:16:59 -0400, John Gilmore wrote:

The question which is faster depends in detail upon

1) whether binary search is implemented in assembly language, in which
case ternary comparison operations are available, or in a
statement-level language other that PL/I, in which case ternary
comparisons must be simulated using a binary decision tree, and

I would expect any decent compiler nowadays to optimize that
difference away.

A decent optimizer would have unrolled the loop. On a z196 a branch misdirection penalty is 19 cycles.

*    struct
*    {
*      char inst[4];
*      char code;
*    } optab[0] =
*    {
*        { "TM ", 0x91 }, { "NI ", 0x94 }, { "CLI", 0x95 },
*        { "OI ", 0x95 }, { "XI ", 0x97 }
*    };
*
*    for ( int i = 0; i < sizeof optab / sizeof optab[0]; i++ )
*    {
*        if ( memcmp( optab[i].inst, opcode, 3 ) == 0 )
LA 15,0 CLC @84optab(3),0(1) LA 14,0 BRE @2L71
         CLC    @84optab+5(3),0(1)
LA 14,5 BRE @2L71
         CLC    @84optab+10(3),0(1)
LA 14,10 BRE @2L71
         CLC    @84optab+15(3),0(1)
LA 14,15 BRE @2L71
         CLC    @84optab+20(3),0(1)
LA 14,20 BRE @2L71
         BRU    @2L40

For the other loop which binary encodes the bitstring the optimizer removes all branching by using those fancy new load on condition instructions. It would be interesting to know how many of those instructions could be processed in a superscalar group in a single cycle.

*    unsigned char c;
* for ( int i = 0; i < 8; i++ ) * {
*        if ( s[i] == '1' )
         LLC    1,0(0,14)(*)Cuchar
         LLC    2,1(0,14)(*)Cuchar
OILL 0,X'0001' LLC 3,2(0,14)(*)Cuchar
         LLC    4,3(0,14)(*)Cuchar
CHI 1,241 LLC 5,4(0,14)(*)Cuchar
         LLC    6,5(0,14)(*)Cuchar
LOCR 15,0,B'1000' LR 0,15 OILL 0,X'0002' CHI 2,241 LOCR 15,0,B'1000' LR 0,15 OILL 15,X'0004' NILF 15,X'000000FF' NILF 0,X'000000FF' CHI 3,241 LOCR 0,15,B'1000' LR 15,0 OILL 15,X'0008' CHI 4,241 LOCR 0,15,B'1000' LR 1,0 OILL 0,X'0010' NILF 1,X'000000FF' NILF 0,X'000000FF' CHI 5,241 LOCR 1,0,B'1000' LR 15,1 OILL 15,X'0020' CHI 6,241 LOCR 1,15,B'1000' LR 15,1 LLC 0,6(0,14)(*)Cuchar OILL 1,X'0040' LLC 14,7(0,14)(*)Cuchar NILF 15,X'000000FF' NILF 1,X'000000FF' CHI 0,241 LOCR 15,1,B'1000' LR 0,15 OILL 0,X'0080' CHI 14,241 LOCR 15,0,B'1000' NILF 15,X'000000FF' * {
*            c |= 1 << i;
*        }
*    }
*    return c;
* }







Both are, of course, radically inferior to branch-table based schemes,
e.g., the one used to implement the C switch staterment and the
analogous PL/I select group, that do no searching at all.

Long ago, maintaining the code generation of a compiler for Pascal,
which language allows radically sparse CASE statements, I selected
whether to use a branch-tree or a branch-table based on the
sparseness of the labels.  If I selected a branch-tree, it was binary,
exploiting the ternary result of the s/370 Compare instruction.  Since
performance of the branch-table is always better, I made the decision
based on (roughly estimated) object code size.  There was no option
for the programmer.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to