Re: Why UTF-8/16 character encodings?

Diggory Sat, 25 May 2013 12:00:27 -0700

"limited success of UTF-8"

Becoming the de-facto standard encoding EVERYWERE except forwindows which uses UTF-16 is hardly a failure...

I really don't understand your hatred for UTF-8 - it's simple todecode and encode, fast and space-efficient. Fixed widthencodings are not inherently fast, the only thing they are fasterat is if you want to randomly access the Nth character instead ofthe Nth byte. In the rare cases that you need to do a lot of thiskind of random access there exists UTF-32...

Any fixed width encoding which can encode every unicode charactermust use at least 3 bytes, and using 4 bytes is probably going tobe faster because of alignment, so I don't see what the greatimprovement over UTF-32 is going to be.

slicing does require decoding

Nope.

I didn't mean that people are literally keeping code pages. Imeant that there's not much of a difference between code pageswith 2 bytes per char and the language character sets in UCS.

Unicode doesn't have "language character sets". The differentplanes only exist for organisational purposes they don't affecthow characters are encoded.

?! It's okay because you deem it "coherent in its scheme?" Ideem headers much more coherent. :)

Sure if you change the word "coherent" to mean somethingcompletely different... Coherent means that you store relatedthings together, ie. everything that you need to decode acharacter in the same place, not spread out between part of acharacter and a header.

but I suspect substring search not requiring decoding is theexception for UTF-8 algorithms, not the rule.

The only time you need to decode is when you need to do sometransformation that depends on the code point such as convertingcase or identifying which character class a particular characterbelongs to. Appending, slicing, copying, searching, replacing,etc. basically all the most common text operations can all bedone without any encoding or decoding.

Re: Why UTF-8/16 character encodings?

Reply via email to