All

I've posted the 4 solutions submitted for Mainframe Assembler Coding
Contest Problem #22 here:

http://www.z390.org/z390_Mainframe_Assemble_Coding_Contest.htm

All 4 solutions were run on Windows Vista using z390 V1.5.01e using the
following text string submitted with the current winning entry by Fritz
Schneider:

         DC    C'Code fastest instruction sequence to count bits '
         DC    C'in an arbitrary string of bytes using currently '
         DC    C'available z/Architecture instructions prior to '
         DC    C'new instruction coming with z196 which is '
         DC    C'estimated to be 5 times faster.'

1.  The winning solution by Fritz Schneider processes up to 120 bytes at a
time. First it uses 2 EX instructions to move and translate bytes in the
120 byte work area.  Then it uses an add logical instruction (AL) and BXLE
loop to add up 4 counts at a time in up to 30 words (Note this works
because 8 * 30 is 240 which is less than the 255 maximum value allowed in
each byte).  Then the 4 byte accumulators for a block are added separately
to the grand total. For the above text this solution executed 194
instructions.  The working storage required was 256 byte reference table
and 120 byte work area.

2.  The second place solution by Glenn Herrmannsfeldt uses a single loop
with LG followed by multiple SRLG, NGR, and AGR's, plus BXLE.  For the
above text this solution executed 630 instructions.  There is no working
storage required as all operations are done in the registers.  This
solution uses a totally different approach from the other 3 solutions, and
might actually prove to be fastest on a real z196.

3.  The third place solution by Melvyn Maltz uses a single TROO
instruction to translate the entire string to bit counts and then uses a
loop to add byte counts to accumulator.  For the above text this solution
executed 897 instructions.  The working storage required was 256 byte
reference table plus work area equal to string length.

4.  The forth place solution by Don Higgins used a single loop with IC,
IC, AR, and BXLE to count the bits in a register.  For the above text this
solution executed 904 instructions.  The working storage required was 256
byte reference table.

I want to thank Fritz, Glen, Melvyn, and all the discussion participants.
Additional entries with new approaches or major optimizations are welcome.

Don Higgins
[email protected]

Reply via email to