The assumptions people bring to a coding problem are always interesting and 
sometimes drôle.

Don Higgins wrote:

<begin snippet>
I coded and tested a version yesterday that has a 4 instruction BXLE loop. But 
I'm curious to see if there are faster versions perhaps with some minimum 
length and trailer byte restrictions.
<end snippet>

The easiest way to solve this problem is table-driven.  A 256-byte table of the 
form

000000000==>00000000
000000001==>00000001
000000010==>00000001
000000011==>00000010
000000100==>00000001
000000101==>00000010
000000110==>00000010
000000111==>00000111
. . . . . . . . . . . . . . . . . .
11111111==>00001000

in which each of the target table's bytes contains the number of 1 bits, b, in 
its source byte, where 0<=b<=8, which permits a single TROO instruction to be 
used to obtain the numbers of bits in each source byte individually.  That 
done, a <load byte, add> loop of one's choice obtains their sum.

In the spirit of the contest I have omitted the lowest-level details, but this 
scheme is very fast.  Its "defect" is that it uses scratch storage equal in 
length to that of the string examined.  That could be "remedied", reduced to 
256 bytes, by using a TR instruction in another loop and an EX instruction to 
sop up any remaining bytes.

John Gilmore Ashland, MA 01721-1817 USA

Reply via email to