On Sunday, 11 June 2023 at 00:05:52 UTC, H. S. Teoh wrote:
On Sat, Jun 10, 2023 at 09:58:12PM +0000, Cecil Ward via
Digitalmars-d-learn wrote:
On Friday, 9 June 2023 at 15:07:54 UTC,
[...]
On contemporary machines, the CPU is so fast that memory access
is a much bigger bottleneck than processing speed. So unless an
operation is being run hundreds of thousands of times, you're
not likely to notice the difference. OTOH, accessing memory is
slow (that's why the memory cache hierarchy exists). So utf8 is
actually advantageous here: it fits in a smaller space, so it's
faster to fetch from memory; more of it can fit in the CPU
cache, so less DRAM roundtrips are needed. Which is faster.
Yes you need extra processing because of the variable-width
encoding, but it happens mostly inside the CPU, which is fast
enough that it generally outstrips the memory roundtrip
overhead. So unless you're doing something *really* complex
with the utf8 data, it's an overall win in terms of
performance. The CPU gets to do what it's good at -- running
complex code -- and the memory cache gets to do what it's good
at: minimizing the amount of slow DRAM roundtrips.
I completely agree with H. S. Teoh. That is exactly what I was
going to say. The point is that considerations like this have to
be thought through carefully and width of types really does
matter in the cases brought up.
But outside these cases, as I said earlier, stick to uint, size_t
and ulong, or uint32_t and uint64_t if exact size is vital, but
do also check out the other std.stdint types too as very
occasionally they are needed.