On Tue, 3 Feb 2004, Edmund GRIMLEY EVANS wrote:
> > Yes, it would be better to call the more general encoding, say, UTF-P.
> 
> Surely they're the same encoding applied to a different set of points?

What's the specification of UTF-8?  Is it "whatever I say it is", or is
there some particular document which defines it? 

If that particular document is RFC 3629 or Unicode 4.0 (which RFC 3629
defers to), then no, they are not the same encoding, because those
documents clearly specify that only a certain set of points can be encoded
in valid UTF-8. 

> Or would you claim that the function f(x) = 1/x on the interval 0 < x
> < 1 is a different function from f(x) = 1/x on the interval 0 < x < 2?

Arguably not, but the functions:

    g(x) = ( 1/x          for 0 < x < 1
           ( undefined    elsewhere

    h(x) = ( 1/x          for 0 < x < 2
           ( undefined    elsewhere

are clearly and unambiguously different functions.  (One is an extension
of the other, but they definitely are not identical.)  And that's the
sort of function we are talking about. 

> In a sense they are different functions, but it's convenient and
> natural to give them the same name, and they can both have the same
> implementation if you leave it to the caller to check that x is in
> range.

A conforming implementation of a function like my g(x), or the UTF-8
encoding, includes the range check by definition.  Splitting the code
between a helper function and its caller may be a useful implementation
method, but the helper function is only part of an implementation and
should not be mislabeled as being the whole thing. 

                                                          Henry Spencer
                                                       [EMAIL PROTECTED]


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to