Re: Perl & unicode weirdness.

Henry Spencer Fri, 06 Feb 2004 17:26:12 -0800

On Wed, 4 Feb 2004, Edmund GRIMLEY EVANS wrote:
> > A conforming implementation of a function like my g(x), or the UTF-8
> > encoding, includes the range check by definition.
> 
> Which definition? Are you sure validation is compulsory?


The definition I provided for g(x) was explicit that it is undefined
outside the specified range.  An implementation which provides a normal
value for, say, g(-1) is not implementing the g(x) I specified. 

> Also, since there's no point in checking for error conditions that you
> don't know how to handle, I hope you have a clear idea of what to do
> with these "illegal" high characters in various circumstances, because
> I don't.

In the absence of specific action by the calling program, *probably* the
best thing to do, when seeing such a character in external input, is to
replace it with U+FFFD.  This increases the chances that the problem will
be noticed, without causing gratuitous malfunctions in cases where it's
not actually important to the program.

Note that it's important that the calling program be able to override this
behavior, for cases where (a) the funny characters are being used for
internal purposes, (b) more intelligent handling of the error is possible,
or (c) it's important to preserve the original data even if it is
malformed. 

                                                          Henry Spencer
                                                       [EMAIL PROTECTED]


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Perl & unicode weirdness.

Reply via email to