On 30/04/2013 4:12 AM, Paul Gilmartin wrote:
On Mon, 29 Apr 2013 13:16:59 -0400, John Gilmore wrote:
The question which is faster depends in detail upon
1) whether binary search is implemented in assembly language, in which
case ternary comparison operations are available, or in a
statement-level language other that PL/I, in which case ternary
comparisons must be simulated using a binary decision tree, and
I would expect any decent compiler nowadays to optimize that
difference away.
A decent optimizer would have unrolled the loop. On a z196 a branch
misdirection penalty is 19 cycles.
* struct
* {
* char inst[4];
* char code;
* } optab[0] =
* {
* { "TM ", 0x91 }, { "NI ", 0x94 }, { "CLI", 0x95 },
* { "OI ", 0x95 }, { "XI ", 0x97 }
* };
*
* for ( int i = 0; i < sizeof optab / sizeof optab[0]; i++ )
* {
* if ( memcmp( optab[i].inst, opcode, 3 ) == 0 )
LA 15,0
CLC @84optab(3),0(1)
LA 14,0
BRE @2L71
CLC @84optab+5(3),0(1)
LA 14,5
BRE @2L71
CLC @84optab+10(3),0(1)
LA 14,10
BRE @2L71
CLC @84optab+15(3),0(1)
LA 14,15
BRE @2L71
CLC @84optab+20(3),0(1)
LA 14,20
BRE @2L71
BRU @2L40
For the other loop which binary encodes the bitstring the optimizer
removes all branching by using those fancy new load on condition
instructions.
It would be interesting to know how many of those instructions could be
processed in a superscalar group in a single cycle.
* unsigned char c;
* for ( int i = 0; i < 8; i++ )
* {
* if ( s[i] == '1' )
LLC 1,0(0,14)(*)Cuchar
LLC 2,1(0,14)(*)Cuchar
OILL 0,X'0001'
LLC 3,2(0,14)(*)Cuchar
LLC 4,3(0,14)(*)Cuchar
CHI 1,241
LLC 5,4(0,14)(*)Cuchar
LLC 6,5(0,14)(*)Cuchar
LOCR 15,0,B'1000'
LR 0,15
OILL 0,X'0002'
CHI 2,241
LOCR 15,0,B'1000'
LR 0,15
OILL 15,X'0004'
NILF 15,X'000000FF'
NILF 0,X'000000FF'
CHI 3,241
LOCR 0,15,B'1000'
LR 15,0
OILL 15,X'0008'
CHI 4,241
LOCR 0,15,B'1000'
LR 1,0
OILL 0,X'0010'
NILF 1,X'000000FF'
NILF 0,X'000000FF'
CHI 5,241
LOCR 1,0,B'1000'
LR 15,1
OILL 15,X'0020'
CHI 6,241
LOCR 1,15,B'1000'
LR 15,1
LLC 0,6(0,14)(*)Cuchar
OILL 1,X'0040'
LLC 14,7(0,14)(*)Cuchar
NILF 15,X'000000FF'
NILF 1,X'000000FF'
CHI 0,241
LOCR 15,1,B'1000'
LR 0,15
OILL 0,X'0080'
CHI 14,241
LOCR 15,0,B'1000'
NILF 15,X'000000FF'
* {
* c |= 1 << i;
* }
* }
* return c;
* }
Both are, of course, radically inferior to branch-table based schemes,
e.g., the one used to implement the C switch staterment and the
analogous PL/I select group, that do no searching at all.
Long ago, maintaining the code generation of a compiler for Pascal,
which language allows radically sparse CASE statements, I selected
whether to use a branch-tree or a branch-table based on the
sparseness of the labels. If I selected a branch-tree, it was binary,
exploiting the ternary result of the s/370 Compare instruction. Since
performance of the branch-table is always better, I made the decision
based on (roughly estimated) object code size. There was no option
for the programmer.
-- gil
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN