RE: NFS4 requires UTF-8 [NFC versus NFD]

McDonald, Ira Sun, 24 Feb 2002 15:00:30 -0800

Hi Kent,

SLP is 'Service Location Protocol'.  SLPv2 Protocol is RFC 2608.
It is widely used for locating printers, file servers, etc.
It is also the basis of storage server discovery in the new IETF 
IPS iSCSI (SCSI over IP) Internet-based storage system I-Ds.


The cogent point about the use of NFKC in SLP is that it's
been recommended to us (SLP folks) for use in _all_ string
comparisons of SLP-registered 'service attributes', which may
be URLs or may be any other string data.  Perhaps that's
reasonable for string comparisons (probably better than NFC)?

Since SLP borrows string comparison filters from LDAPv3, the
strings are further case-folded to allow case-insensitive
string comparisons (there is some controversy over this).

But in filenames (e.g., pathname parts of URLs) NFKC seems like
a very poor choice, because it loses information for mapping
back to local legacy charsets, right?

Cheers,
- Ira McDonald
  High North Inc

-----Original Message-----
From: Kent Karlsson [mailto:[EMAIL PROTECTED]]
Sent: Sunday, February 24, 2002 4:06 AM
To: [EMAIL PROTECTED]
Subject: RE: NFS4 requires UTF-8 [NFC versus NFD]




> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of McDonald, Ira
...
> This becomes even murkier.  W3C _was_ using NFC, as you say, but:

W3C is developing a "W3C normalisation", which is NFC augmented with
considerations of character references (e.g., in XML: &#xhhhh;).


> a)  When the SLP Project (successor to IETF Service Location WG)
>     recently asked for advice about which normalization to use
>     for SLP string compares, Harald Alvestrand -- author of 
>     RFC 2277 "IETF Policy on Character Sets and Languages" and 
>     RFC 3066 "Tags for the Identification of Languages" -- told
>     us to use NFKC (which folds compatibility equivalents into
>     their base characters).  Note that SLP service attributes
>     frequently contain URLs, so this amounts to advice to use
>     NFKC for comparing URLs.

Which seems reasonable (I haven't checked what SLP is), if properly
augmented (Hangul again...).

> b)  The latest "Stringprep Profile for Internationalized Host Names"
>     <draft-ietf-idn-nameprep-07.txt> (9 January 2002)
>     by Paul Hoffman (a Unicode and IETF guru) also uses NFKC.
>     Paul is co-author of RFC 2781 "UTF-16, an encoding of ISO 10646".

        Still, Hoffman did not invent UTF-16.  That was done by people
at the Unicode consortium.  Mark Davis was the editor for the amendment
(as it was originally) to 10646 that describe it on the 10646 side.

>     Note that IDN WG core specs are now in working group 'last call'.

They WERE on "last call".  The comments received will need to be
collected, and acted upon.  The documents were not accepted 'as-is'.
It will take a while before new, revised, documents are available for
a new "last call".  Note that IDN also involves "case folding", so
that domain names remain case insensitive, as well as other mappings
in addition to those of NFKC.


                Kind regards
                /kent k

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

RE: NFS4 requires UTF-8 [NFC versus NFD]

Reply via email to