On Wednesday, 27 November 2013 at 16:22:58 UTC, Wyatt wrote:
Whoops, overzealous pasting. That is, "e\u0308", which composes to "ë". A grapheme cluster seems to represent one printed character: "...a horizontally segmentable unit of text, consisting of some grapheme base (which may consist of a Korean syllable) together with any number of nonspacing marks applied to it."

Is that about right?

-Wyatt

Yes.

A grapheme is also sometimes explained as being the unit that lay people intuitively think of as being a "character".

The difference between a grapheme and a grapheme cluster is just a matter of perspective, like the difference between a character and a code point; the former simply refers to the decoded result, while the latter refers to the sum of encoding parts (where the parts are code points for grapheme cluster, and code units for a code point).

Yet another example is that of the UTF-32 code unit: one UTF-32 code unit is (currently) equal to one Unicode code point, but both terms are meaningful in the right context.

Reply via email to