RE: Unicode in a URL

2001-04-27 Thread Mike Brown
I asserted, referring to section 4.2.2 of the XML spec: !ENTITY greeting SYSTEM http://somewhere/getgreeting?lang=esname=C%C3%A9sar; ] The name Ce'sar is represented here as C%C3%A9sar in the UTF-8 based escaping, as per the XML requirement. You replied: What the XML spec (and all the

RE: Unicode in a URL

2001-04-26 Thread Mike Brown
W3C specifies to use %-encoded UTF-8 for URLs. I think that's an overstatement. Neither the W3C nor the IETF make such a specification. http://www.w3.org/TR/charmod/#sec-URIs contains many ambiguities, conflicts with XML and HTTP, and is not yet a recommendation.

RE: How will software source code represent 21 bit unicode characters?

2001-04-23 Thread Mike Brown
William Overington wrote: In Java source code one may currently represent a 16 bit unicode character by using \u where each h is any hexadecimal character. How will Java, and maybe other languages, represent 21 bit unicode characters? \u in Java source becomes a value of the

RE: Sun's Java encodings vs IANA's character set registry

2001-04-12 Thread Mike Brown
Mark Davis wrote: If you want portability, you won't go there. Even with the same IANA name, the probability that two codepage mappings on different platforms produce precisely the same results in all circumstances is, in our experience, very very low. Yes, I want to be able to say with a

Sun's Java encodings vs IANA's character set registry

2001-04-11 Thread Mike Brown
In an effort to determine the extent to which character sets that might be used on the Internet can be handled by software relying on the native character encoding handling of Sun's J2EE platform, I am making a table that correlates the names and aliases from the IANA's registry of character sets

[unicode] Re: UCS-2 Files

2001-03-22 Thread Mike Brown
When is a byte not eight bits? The Web version of the Oxford English Dictionary (http://dictionary.oed.com) says a byte is always eight bits: Well, just my cursory research shows that to be an overstatement. http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?query=byte says: A byte may

RE: list of the characters that aren't supported by XML/WML

2001-02-02 Thread Mike Brown
Bruce Maginnis wrote: Do you know where I could get a list of the characters that aren't supported by XML/WML, i.e the ones that need to be inserted in actual Unicode values?? The characters that are not allowed in XML documents are not allowed at *all*, not even via character references

RE: i18n URIs (was Re: Supporting this is going to be loads of

2000-11-28 Thread Mike Brown
Addison Phillips wrote: This code is available in a number of places. A good one (embedded in a clear discussion of URIs and I18n) is located at: http://www.w3.org/International/O-URL-code.html This is a nice informative document, but to my knowledge, there are no normative specifications

RE: information request; using unicode in HTML form; urlencoded

2000-10-06 Thread Mike Brown
The last rule will clip Unicode charater to an 8-bit representation The HTML Recommendation and the IETF RFC for URIs both cover this. Anything URL-encoded is supposed to be UTF-8 encoded first (see the URI RFC). However, the HTML Recommendation's section on form data is a little

RE: UTF-16 Character Set

2000-08-31 Thread Mike Brown
Shekhar Jagtap wrote: I am looking for UTF-16 Character Set Please advice as to where this would be available IETF RFC 2781: UTF-16, an encoding of ISO 10646 http://www.faqs.org/rfcs/rfc2781.html ISO/IEC 10646-1:1993 Amendment 1 http://www.cl.cam.ac.uk/~mgk25/ucs/ISO-10646-UTF-16.html Mark

RE: RFC 1766

2000-08-10 Thread Mike Brown
the claim that RFC 1766 freezes obsolete versions Actually, the claim is that RFC 1766 could be interpreted that way, not that it is actually trying to say so. The RFC author's recent statement of intent clarifies that a more lenient interpretation is prudent. The reason it is important to

RE: Summary: xml:lang validity and RFC 1766 refs to outdated

2000-08-09 Thread Mike Brown
I don't see anything in RFC 1766 that hardcodes it to the 1988 versions of either 639 or 3166. I have taken this discussion off the Unicode list. I only started the thread here because I was referencing an earlier post and because ISO 639 language code updates were topical a couple months

RE: Summary: xml:lang validity and RFC 1766 refs to outdated code

2000-08-08 Thread Mike Brown
XML 1.0 says that xml:lang attributes must match production 33 In fact, not so. Productions 33-38 have no normative value whatsoever, as there is neither a production nor normative language connecting them with the rest of XML 1.0. [...] In recognition of this fact, official erratum E73

List archive URL again?

2000-07-03 Thread Mike Brown
http://www.unicode.org/unicode/consortium/distlist.html does not mention the archives. Where are they, again? - Mike Mike J. Brown, software engineer at My XML/XSL resources: webb.net in Denver, Colorado, USA

RE: List archive URL again?

2000-07-03 Thread Mike Brown
I wrote: http://www.unicode.org/unicode/consortium/distlist.html does not mention the archives. Where are they, again? Actually it does mention them, but it just says they're linked from the Unicode home page. I don't think that's true.

RE: Gender symbols

2000-06-26 Thread Mike Brown
I have sometimes wondered why these two useful, pre-existing symbols are not used in the U.S. to denote 'male' and 'female' on e.g. restroom doors. One possibility is that, because they are frequently associated with 'sexuality' A more likely explanation is that they are almost never used