On Tue, 3 Feb 2004, Larry Wall wrote:
> In the case of Perl (or, at least, Perl 5), I made the decision that,
> within the language, strings should be thought of simply as abstract
> sequences of arbitrary integers, and that by default the standards
> enforcement should be at the borders...

That would be my inclination as well.  Only for "interchange" are the
non-characters like U+FFFF questionable; they are explicitly legitimate
for program internal use -- that's what they're for.  Code points beyond
the official end of Unicode are a harder call, but I wouldn't object
loudly to treating them the same way(*). 

(* Well, with one implementation caveat:  UTF-16 can't represent them --
although you could play tricks, extending it by assigning semantics to
nominally-ill-formed surrogate sequences -- so there may be an issue if
that representation is used.)

                                                          Henry Spencer
                                                       [EMAIL PROTECTED]


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to