28-May-2013 00:42, Martin Nowak пишет:
On 05/27/2013 09:21 PM, Martin Nowak wrote:
 > See unittest/benchmark here:
 > https://gist.github.com/blackwhale/5653927
 >
Looks promising.

This will not detect 0xFF as invalid UTF-8 sequence.
For sequences with 5 or 6 bytes, that aren't used for unicode, it will
return a stride of 4.


First of all there is a minor bug in std.utf in a sense that it accepts sequences of 5 and 6 bytes. They are simply explicitly not defined per Unicode standard and should throw invalid UTF as well.

OK I just need to consider the next bit making the whole mask 4bits wide. Thus I need 16 slots in a register.

64bit version will fit just fine  in a register 4*16 = 64.
32bit version will have to go with packing 2bits per slot and doing +1 afterwards.

Here is an updated version that I'm testing again:
https://github.com/blackwhale/gsoc-bench-2012/blob/master/fast_stride.d

--
Dmitry Olshansky

Reply via email to