The assumptions people bring to a coding problem are always interesting and sometimes drôle.
Don Higgins wrote: <begin snippet> I coded and tested a version yesterday that has a 4 instruction BXLE loop. But I'm curious to see if there are faster versions perhaps with some minimum length and trailer byte restrictions. <end snippet> The easiest way to solve this problem is table-driven. A 256-byte table of the form 000000000==>00000000 000000001==>00000001 000000010==>00000001 000000011==>00000010 000000100==>00000001 000000101==>00000010 000000110==>00000010 000000111==>00000111 . . . . . . . . . . . . . . . . . . 11111111==>00001000 in which each of the target table's bytes contains the number of 1 bits, b, in its source byte, where 0<=b<=8, which permits a single TROO instruction to be used to obtain the numbers of bits in each source byte individually. That done, a <load byte, add> loop of one's choice obtains their sum. In the spirit of the contest I have omitted the lowest-level details, but this scheme is very fast. Its "defect" is that it uses scratch storage equal in length to that of the string examined. That could be "remedied", reduced to 256 bytes, by using a TR instruction in another loop and an EX instruction to sop up any remaining bytes. John Gilmore Ashland, MA 01721-1817 USA
