All I've posted the 4 solutions submitted for Mainframe Assembler Coding Contest Problem #22 here:
http://www.z390.org/z390_Mainframe_Assemble_Coding_Contest.htm All 4 solutions were run on Windows Vista using z390 V1.5.01e using the following text string submitted with the current winning entry by Fritz Schneider: DC C'Code fastest instruction sequence to count bits ' DC C'in an arbitrary string of bytes using currently ' DC C'available z/Architecture instructions prior to ' DC C'new instruction coming with z196 which is ' DC C'estimated to be 5 times faster.' 1. The winning solution by Fritz Schneider processes up to 120 bytes at a time. First it uses 2 EX instructions to move and translate bytes in the 120 byte work area. Then it uses an add logical instruction (AL) and BXLE loop to add up 4 counts at a time in up to 30 words (Note this works because 8 * 30 is 240 which is less than the 255 maximum value allowed in each byte). Then the 4 byte accumulators for a block are added separately to the grand total. For the above text this solution executed 194 instructions. The working storage required was 256 byte reference table and 120 byte work area. 2. The second place solution by Glenn Herrmannsfeldt uses a single loop with LG followed by multiple SRLG, NGR, and AGR's, plus BXLE. For the above text this solution executed 630 instructions. There is no working storage required as all operations are done in the registers. This solution uses a totally different approach from the other 3 solutions, and might actually prove to be fastest on a real z196. 3. The third place solution by Melvyn Maltz uses a single TROO instruction to translate the entire string to bit counts and then uses a loop to add byte counts to accumulator. For the above text this solution executed 897 instructions. The working storage required was 256 byte reference table plus work area equal to string length. 4. The forth place solution by Don Higgins used a single loop with IC, IC, AR, and BXLE to count the bits in a register. For the above text this solution executed 904 instructions. The working storage required was 256 byte reference table. I want to thank Fritz, Glen, Melvyn, and all the discussion participants. Additional entries with new approaches or major optimizations are welcome. Don Higgins [email protected]
