On Tue, 3 Feb 2004, Larry Wall wrote:
> In the case of Perl (or, at least, Perl 5), I made the decision that,
> within the language, strings should be thought of simply as abstract
> sequences of arbitrary integers, and that by default the standards
> enforcement should be at the borders...
That would be my inclination as well. Only for "interchange" are the
non-characters like U+FFFF questionable; they are explicitly legitimate
for program internal use -- that's what they're for. Code points beyond
the official end of Unicode are a harder call, but I wouldn't object
loudly to treating them the same way(*).
(* Well, with one implementation caveat: UTF-16 can't represent them --
although you could play tricks, extending it by assigning semantics to
nominally-ill-formed surrogate sequences -- so there may be an issue if
that representation is used.)
Henry Spencer
[EMAIL PROTECTED]
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/