On Wednesday, 27 November 2013 at 16:22:58 UTC, Wyatt wrote:
Whoops, overzealous pasting. That is, "e\u0308", which
composes to "ë". A grapheme cluster seems to represent one
printed character: "...a horizontally segmentable unit of text,
consisting of some grapheme base (which may consist of a Korean
syllable) together with any number of nonspacing marks applied
to it."
Is that about right?
-Wyatt
Yes.
A grapheme is also sometimes explained as being the unit that lay
people intuitively think of as being a "character".
The difference between a grapheme and a grapheme cluster is just
a matter of perspective, like the difference between a character
and a code point; the former simply refers to the decoded result,
while the latter refers to the sum of encoding parts (where the
parts are code points for grapheme cluster, and code units for a
code point).
Yet another example is that of the UTF-32 code unit: one UTF-32
code unit is (currently) equal to one Unicode code point, but
both terms are meaningful in the right context.