Re: Why UTF-8/16 character encodings?

Peter Alexander Fri, 24 May 2013 10:45:35 -0700

On Friday, 24 May 2013 at 17:05:57 UTC, Joakim wrote:

This triggered a long-standing bugbear of mine: why are weusing these variable-length encodings at all?

Simple: backwards compatibility with all ASCII APIs (e.g. most Clibraries), and because I don't want my strings to consumemultiple bytes per character when I don't need it.


Your language header idea is no good for at least three reasons:

1. What happens if I want to take a substring slice of yourstring? I'll need to allocate a new string to add the header in.

2. What if I have a long string with the ASCII header and want toappend a non-ASCII character on the end? I'll need to reallocatethe whole string and widen it with the new header.

3. Even if I have a string that is 99% ASCII then I have to payextra bytes for every character just because 1% wasn't ASCII.With UTF-8, I only pay the extra bytes when needed.

Re: Why UTF-8/16 character encodings?

Reply via email to