I asserted, referring to section 4.2.2 of the XML spec:
!ENTITY greeting SYSTEM
http://somewhere/getgreeting?lang=esname=C%C3%A9sar;
]
The name Ce'sar is represented here as C%C3%A9sar in the
UTF-8 based escaping, as per the XML requirement.
You replied:
What the XML spec (and all the
W3C specifies to use %-encoded UTF-8 for URLs.
I think that's an overstatement.
Neither the W3C nor the IETF make such a specification.
http://www.w3.org/TR/charmod/#sec-URIs
contains many ambiguities, conflicts with XML and HTTP,
and is not yet a recommendation.
William Overington wrote:
In Java source code one may currently represent a 16 bit
unicode character by using \u where each h is any
hexadecimal character.
How will Java, and maybe other languages, represent 21 bit unicode
characters?
\u in Java source becomes a value of the
Mark Davis wrote:
If you want portability, you won't go there. Even with the
same IANA name, the probability that two codepage mappings
on different platforms produce precisely the same results
in all circumstances is, in our experience, very very low.
Yes, I want to be able to say with a
In an effort to determine the extent to which character sets that might be
used on the Internet can be handled by software relying on the native
character encoding handling of Sun's J2EE platform, I am making a table that
correlates the names and aliases from the IANA's registry of character sets
When is a byte not eight bits?
The Web version of the Oxford English Dictionary
(http://dictionary.oed.com)
says a byte is always eight bits:
Well, just my cursory research shows that to be an overstatement.
http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?query=byte says:
A byte may
Bruce Maginnis wrote:
Do you know where I could get a list of the characters
that aren't supported by XML/WML, i.e the ones that need
to be inserted in actual Unicode values??
The characters that are not allowed in XML documents are not allowed at
*all*, not even via character references
Addison Phillips wrote:
This code is available in a number of places. A good one
(embedded in a clear discussion of URIs and I18n) is
located at:
http://www.w3.org/International/O-URL-code.html
This is a nice informative document, but to my knowledge, there are no
normative specifications
The last rule will clip Unicode charater to an 8-bit
representation
The HTML Recommendation and the IETF RFC for URIs both cover this. Anything
URL-encoded is supposed to be UTF-8 encoded first (see the URI RFC).
However, the HTML Recommendation's section on form data is a little
Shekhar Jagtap wrote:
I am looking for UTF-16 Character Set
Please advice as to where this would be available
IETF RFC 2781: UTF-16, an encoding of ISO 10646
http://www.faqs.org/rfcs/rfc2781.html
ISO/IEC 10646-1:1993 Amendment 1
http://www.cl.cam.ac.uk/~mgk25/ucs/ISO-10646-UTF-16.html
Mark
the claim that RFC 1766 freezes obsolete versions
Actually, the claim is that RFC 1766 could be interpreted that way, not that
it is actually trying to say so. The RFC author's recent statement of intent
clarifies that a more lenient interpretation is prudent.
The reason it is important to
I don't see anything in RFC 1766 that hardcodes it to the
1988 versions of either 639 or 3166.
I have taken this discussion off the Unicode list. I only started the thread
here because I was referencing an earlier post and because ISO 639 language
code updates were topical a couple months
XML 1.0 says that xml:lang attributes must match production 33
In fact, not so. Productions 33-38 have no normative value
whatsoever, as there is neither a production nor normative
language connecting them with the rest of XML 1.0.
[...]
In recognition of this fact, official erratum E73
http://www.unicode.org/unicode/consortium/distlist.html
does not mention the archives. Where are they, again?
- Mike
Mike J. Brown, software engineer at My XML/XSL resources:
webb.net in Denver, Colorado, USA
I wrote:
http://www.unicode.org/unicode/consortium/distlist.html
does not mention the archives. Where are they, again?
Actually it does mention them, but it just says they're linked from the
Unicode home page. I don't think that's true.
I have sometimes wondered why these two useful, pre-existing symbols
are not used in the U.S. to denote 'male' and 'female' on
e.g. restroom doors. One possibility is that, because they are
frequently associated with 'sexuality'
A more likely explanation is that they are almost never used
16 matches
Mail list logo