On 03-Aug-12 00:40, Artur Skawina wrote:
On 08/02/12 18:47, Dmitry Olshansky wrote:
char[] input = ...;
size_t idx = ...;
size_t len = stride(input, idx);
uint u8word = *cast(uint*)(input.ptr+idx);

So why do we use dchar and not UTF-8 word, as it's as good as dchar and faster 
to obtain?

Iff unaligned accesses happen to be legal on the platform _and_ iff doing
them is faster than the (not that complex) decoding.


You read memory either way, suppose you read it byte by byte vs "1 or 2 words (if unaligned)" at once.

And take a look at std.utf, I'd say it is rather involved.

In any case there is a minimum of:
mask out upper contol bits, shift to proper position or with result register [repeat per byte]
return result


Of course, I'm biased by x86 but it is my understanding that unaligned support is more or less understood to be a good feature. Arm v6+ seems to have it. And I suspect there is a way to recode the above to be more word-aligned friendly (e.g. via adding explicit leftover word).

--
Dmitry Olshansky

Reply via email to