On 17/03/16 20:29, Matthias Clasen wrote: > Terminology can certainly be confusing at times, but I think that a > Unicode character is a perfectly well-defined entity, non-withstanding > the fact that it can be represented in various encodings (a utf8 > sequence, a ucs4 word, a utf-16 surrogate pair, etc).
You mean a code point, then (that's basically what gunichar is). I think the reason Unicode people are so pedantic about "code point" is because a code point may or may not be what you actually mean when you say "character", whereas it's rare that I see "code point" used with a meaning other than its Unicode one. More precisely, a Unicode code point is an abstract entity indexed by a number, such as U+0041 LATIN CAPITAL LETTER A or U+262D HAMMER AND SICKLE, which can only be concretely represented as some particular byte sequence by passing it through an encoding like UCS-4, UTF-8 or ISO-8859-1. Some encodings are more obvious than others, and in particular non-Unicode encodings like ISO-8859-1 cannot represent every Unicode code point. -- Simon McVittie Collabora Ltd. <http://www.collabora.com/> _______________________________________________ gtk-devel-list mailing list gtk-devel-list@gnome.org https://mail.gnome.org/mailman/listinfo/gtk-devel-list