28-May-2013 00:42, Martin Nowak пишет:
On 05/27/2013 09:21 PM, Martin Nowak wrote:
> See unittest/benchmark here:
> https://gist.github.com/blackwhale/5653927
>
Looks promising.
This will not detect 0xFF as invalid UTF-8 sequence.
For sequences with 5 or 6 bytes, that aren't used for unicode, it will
return a stride of 4.
First of all there is a minor bug in std.utf in a sense that it accepts
sequences of 5 and 6 bytes. They are simply explicitly not defined per
Unicode standard and should throw invalid UTF as well.
OK I just need to consider the next bit making the whole mask 4bits
wide. Thus I need 16 slots in a register.
64bit version will fit just fine in a register 4*16 = 64.
32bit version will have to go with packing 2bits per slot and doing +1
afterwards.
Here is an updated version that I'm testing again:
https://github.com/blackwhale/gsoc-bench-2012/blob/master/fast_stride.d
--
Dmitry Olshansky