pugs-comm...@feather.perl6.nl wrote:
 In the abstract, Perl is written in Unicode, and has consistent Unicode
-semantics regardless of the underlying text representations.
+semantics regardless of the underlying text representations.  By default
+Perl presents Unicode in "NFG" formation, where each grapheme counts as
+one character.  A grapheme is what the novice user would think of as a
+character in their normal everyday life, including any diacritics.

What's with this NFG / Normal Form G that you refer to? I don't see any mention of that in http://unicode.org/reports/tr15/ ... did you mean NFC?

For that matter, is it possible for all realistic combinations of diacritics and base letters to be represented by a single Unicode codepoint, including all language-dependent graphemes?

I thought NFC sort of did one codepoint per grapheme but there were a few exceptions ... I could be wrong on that point.

-- Darren Duncan

Reply via email to