On Tuesday, 25 October 2016 at 19:16:07 UTC, Steven Schveighoffer wrote:
On 10/25/16 12:57 PM, Dmitry Olshansky wrote:
On 10/12/16 3:53 PM, Andrei Alexandrescu wrote:
So we've had a good run with making popFront smaller. In ASCII
microbenchmarks with ldc, the speed is indistinguishable from s = s[1 .. $]. Smaller functions make sure that the impact on instruction cache in
larger applications is not high.
[snip]
Your mission, should you choose to accept it, is to define a combination
front/popFront that reduces the gap.


I'm sooo late to the party. Still a neat trick with UTF lookup table is to cram it into a single word with bit packing. Since we only ever consider top 5 bits of leading byte of which the top one can be safely discarded with a quick < 0x80 check. This leaves us at 4bits -> 16 entry
table, fits nicely in a 64-bit word with 4bits per entry.

Half of the entries in your table are 0 (every char that starts with 10). In addition, you only need 2 bits to store the length (max byte sequence is 4, subtract 1 from sequence to get 2 bits).

So I think you can fit this into a 16-bit word (8 entries of 2 bits each). You just need to pre-check for the invalid sequences (you no longer have to check for 0 result):

if(c < 0x80) s = s[1 .. $];
else if(c < 0xc0 || c > 0xf7) throw new Exception("blah");
else s = s[1 + myStride(c) .. $];

Should behave better on 32-bit system.

You could also store 3-bits to avoid extra addition.

-Steve

The whole point of a LUT to begin with is to reduce instructions.

Reply via email to