Dear friends, colleagues, everybody,
W3C has opened a position in Internationalization at Keio University
in Japan, because I'm leaving the W3C Team at the end of March.
For details, please see http://www.w3.org/2004/12/i18nposition.
For other open positions at W3C, please see
Dear Unicoders,
Some of you may be interested in this:
After discussion with Chris Newman, author of the Internet Draft
http://www.ietf.org/internet-drafts/draft-newman-i18n-comparator-02.txt,
we have created a new mailing list, [EMAIL PROTECTED],
for discussion (and hopefully completion) of this
The Internationalization Working Group of the W3C is glad
to announce the publication of two new documents:
Character Model for the World Wide Web 1.0: Fundamentals
(http://www.w3.org/TR/charmod, Last Call) and
Character Model for the World Wide Web 1.0: Normalization
At 23:34 03/12/07 +0900, Jungshik Shin wrote:
On Sun, 7 Dec 2003, Peter Jacobi wrote:
There is some mixup of lang and encoding tagging, which I didn't fully
understand.
When lang is not explicitly specified, Mozilla resorts to 'infering'
'langGroup' ('script (group)' would have been a
At 23:16 03/12/07 +0900, Jungshik Shin wrote:
On Sun, 7 Dec 2003, Peter Jacobi wrote:
So, I'm still wondering whether Unicode and HTML4 will consider
span style='color:#00f'#x0BB2;/span#x0BBE;
valid and it is the task of the user agent to make the best out of it.
I think this is valid.
I
Hello Peter,
At 13:25 03/12/07 +0100, Peter Jacobi wrote:
Dear Doug, All,
BTW, your Unicode test page is marked:
meta http-equiv=Content-Type
content=text/html; charset=ISO-8859-1
This is of course redundant as this is the HTTP default.
Well, the HTTP spec unfortunately still says so, but
Hello Doug, others,
Here is my most probable explanation:
Adelphia recently upgraded to Apache 2.0. The core config file (httpd.conf)
as distributed contains an entry
AddDefaultCharset iso-8859-1
which does what you have described. They probably adopted this
because the comment in the config
Hello Marion,
IANA won't ask your question. They are just the record keeper,
they don't make any decisions.
If you have a need for identifying a particular kind of language,
then what you do is that you submit a registration proposal.
Others will then comment on that proposal. If you don't have
At 13:14 03/02/18 -0800, Jonathan Coxhead wrote:
That's a very long-winded way of writing it!
How about this:
#!/usr/bin/perl -pi~ -0777
# program to remove a leading UTF-8 BOM from a file
# works both STDIN - STDOUT and on the spot (with filename as
argument)
At 11:24 03/02/21 -0800, Markus Scherer wrote:
Marco Cimarosti wrote:
BTW, would it be possible to encode XML in SCSU?
Yes. Any reasonable SCSU encoder will stay in the ASCII-compatible
single-byte mode until it sees a character from beyond Latin-1. Thus the
encoding declaration will be
At 13:41 02/10/02 +0900, Martin Duerst wrote:
I'm not sure this is possible with Apache, maybe there is a need
for a RemoveCharset directive similar to RemoveType
(http://httpd.apache.org/docs/mod/mod_mime.html#removetype).
Or maybe there is some other way to get the same result.
If a new
Dear Unicoders,
As announced at the International Unicode Conference in San Jose
the W3C Internationalization Activity has recently been restructured,
and the Internationalization Working Group (WG) and Interest Group (IG)
have been re-chartered. We are sure that this will provide you with
At 12:14 02/10/01 -0400, [EMAIL PROTECTED] wrote:
I agree that 'sniffing' and 'guessing' are ill-defined, and not to be
relied upon. However, I find it a bit 'ill-defined' that there is no
well-defined (web server independent) way for the 'users' to override
the possibly wrong encoding default
At 07:37 02/09/26 +0900, [EMAIL PROTECTED] wrote:
I would be happy if just this
meta http-equiv=Content-Type content=text/html; charset=utf-8/
would be enough to convince the browsers that the page is in UTF-8...
It isn't if the HTTP server claims that the pages it serves are in
ISO 8859-1. A
At 22:25 02/04/19 +0100, Steffen Kamp wrote:
However, when giving the validator a ASCII-only document with a META tag
specifying UTF-16 as encoding (just for testing) it says that it does not
yet support this encoding, so I don't fully trust the validator in this case.
The validator indeed
Just a very small correction:
At 07:19 02/04/22 -0400, James H. Cloos Jr. wrote:
There are other ways as well. Apache will already (if you use the
default configs) add the Content-Language header if you use a filename
like foo.en.html. You could have it also add the charset via a
similar
Dear Unicoders,
I have just submitted draft-w3c-i18n-iri-00.txt to the Internet Drafts
editor. This draft replaces draft-masinter-url-i18n-08.txt. It should be
published in a few hours/days. In the mean time it is available at
http://www.w3.org/International/2002/draft-w3c-i18n-iri-00.txt.
Hello Yung-Fong,
First, please send potential error reports to [EMAIL PROTECTED]
as indicated in the spec. Second, as somebody else has already
said, the XML Core WG is working on extending the repertoire of
XML Names in XML Blueberry / XML 1.1.
If you have any specific comments, I suggest you
Character-based compression schemes have been suggested by others.
But this is not necessary, you can take any generic data compression
method (e.g. zip,...) and it will compress very efficiently.
The big advantage of using generic data compression is that
it's already available widely, and in
Dear Unicoders,
W3C organized a workshop co-located with the 20th
Unicode Conference last month in Washington DC,
to discuss the future of the W3C Internationalization
Activity.
The minutes and results of the workshop are now published at
http://www.w3.org/2002/02/01-i18n-workshop
Hello Brant,
This is not really a Web internastionalization question.
Therefore I'm forwarding it to the unicode mailing list.
Regards,Martin.
At 08:48 02/02/05 -0500, IDAutomation.com, Inc. wrote:
I am hoping you can help me with a FileMaker task. We sell barcode fonts and
we have several
At 21:44 02/01/06 -0800, James Kass wrote:
Martin Duerst wrote,
(I wrote,)
It would be perfectly correct and might even allow the page to
sport one of those valid-HTML gifs from W3.
But it doesn't. Just tried changing the charset on an NCR Deseret test
page from UTF-8 to US-ASCII. Both
Dear Unicoders,
The deadlines for registrations and submissions for the
W3C Internationalization workshop are approaching rapidly;
please make sure you don't miss them.
Registration deadline:
January 10th, 2002 (Thursday)
(see
At 00:05 02/01/04 -0500, Tex Texin wrote:
Thanks to James Kass, we have a new version of the Unicode examples for
plane 1, that uses UTF-8, instead of NCRs.
So the following link is to the original page that is code page
x-user-defined and uses NCRs for supplementary characters:
At 17:30 01/12/25 -0800, Michael (michka) Kaplan wrote:
From: "$BAk]namdqor(B $BDialamt_dgr"(B [EMAIL PROTECTED]
By the way, does any browser in common use
support the Ruby extensions to HTML?
The 'ruby extensions for HTML' are defined in
http://www.w3.org/TR/ruby/, a W3C recommendation.
I agree with Jungshik that U+76F4 (straight) is possibly the
case where unification went farthest in the sense that it's
the case where average modern readers in various areas might
be most (1) confused if they see the glyph variant they are
not used to.
(1) 'most confused' should not be
Hello James (and everybody else),
Can you please send comments and bug reports on the validator to
[EMAIL PROTECTED]? Sending bug reports to the right address
seriously increases the chance that they get fixed.
Regards, Martin.
At 14:46 01/12/16 -0800, James Kass wrote:
Elliotte Rusty Harold
At 07:16 01/12/14 -0800, James Kass wrote:
Having an HTML validator, like Tidy.exe, which generates errors
or warnings every time it encounters a UTF-8 sequence is
unnerving. It's especially irritating when the validator
automatically converts each string making a single UTF-8
character into two
As the person who implemented UTF-8 checking for http://validator.w3.org,
I beg to disagree. In order to validate correctly, the validator has
to make sure it correctly interprets the incomming byte sequence as
a sequence of characters. For this, it has to know the character
encoding. As an
Dear Unicoders,
W3C is holding a workshop on Internationalization to evaluate
the work over the last years and decide on new directions
(in particular guidelines and outreach). Details are
as follows:
Date: 1 February 2002
Location: Omni Shoreham Hotel, Washington DC, USA
The Call for
Dear Unicoders,
http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-08.txt
about the internationalization of URIs (called IRIs) has recently
been updated and published.
This has been around for a long time, but we plan to move ahead with it
in the very near future. Please have a look at
I suggest you look at tools that in one way or another produce
SVG. SVG is based on XML and therefore supports Unicode.
Please see http://www.w3.org/Graphics/SVG/ and
http://www.w3.org/Graphics/SVG/SVG-Implementations.htm8#svgedit
and below.
Please note that not all tools may support the same
It's very much working that way in any serious browsers.
Some font formats (e.g. bitmaps for XWindows on Unix)
use layouts corresponding to traditional encodings.
Truetype fonts used on many systems can be directly
accessed by Unicode, but part of the info in a conversion
table is still needed to
At 10:39 01/08/30 +0100, [EMAIL PROTECTED] wrote:
Additionally, if you are thinking of XML (or
HTML) then you can encode *all* Unicode characters in an EUC-encoded
document, by employing numeric character references for characters
outside the EUC character repertoire. Using the same technique,
Hello David,
What you say is true, but it affects only a very small set of
codepoints, mainly symbols. For more documentation, I recommend
to read http://www.w3.org/TR/japanese-xml/.
Regards, Martin.
At 13:13 01/08/30 -0500, David Starner wrote:
On Thu, Aug 30, 2001 at 09:51:24AM -0700,
There are lots of examples out there, but mostly in legacy encodings.
If you need one in an UTF, just convert it yourself (and make sure
you change or remove 'encoding=euc-jp').
XML mandates that every processor (the receiving end) understands
UTF-8 and UTF-16, but documents can be in other
At 01:44 01/07/21 -0400, [EMAIL PROTECTED] wrote:
In a message dated 2001-07-20 6:19:24 Pacific Daylight Time, [EMAIL PROTECTED]
writes:
You can find a better way to do furigana, and an answer to many
of your questions, at http://www.w3.org/TR/ruby (the Ruby Annotation
Recommendation).
that a document
does not allow C0 control characters, a feature that would
be very important for many cases if the basic XML syntax
would start to allow C0.
Regards, Martin.
At 10:32 01/07/19 -0600, Shigemichi Yazawa wrote:
At Thu, 19 Jul 2001 15:52:39 +0900,
Martin Duerst [EMAIL PROTECTED] wrote
Hello Patrick,
You can find a better way to do furigana, and an answer to many
of your questions, at http://www.w3.org/TR/ruby (the Ruby Annotation
Recommendation).
Regards, Martin.
At 18:40 01/07/19 -0400, Patrick Andries wrote:
Just a small question about annotation characters.
If I
At 14:30 01/07/17 -0700, Mark Davis wrote:
In that case the content of the field is not text but an octet string,
and you need to do something different, like base64-ing it.
The content in the database is not an octet string: it is a text field that
happens to have a control code -- a
For people interested in new scripts, and new uses
of existing scripts :-)
http://www.google.com/intl/xx-hacker/
Regards, Martin.
Hello Elliotte,
Just two points:
- If you are suggesting that discussion move to xml-dev, can you
please give the full address of that mailing list?
- I suggest you/we don't cross-post [EMAIL PROTECTED], because
it's not an issue the Unicode consortium has to decide.
(I'm just
At 22:58 01/05/17 -0400, [EMAIL PROTECTED] wrote:
Martin D$BS(Bst wrote:
There is about 5% of a justification
for having a 'signature' on a plain-text, standalone file (the reason
being that it's somewhat easier to detect that the file is UTF-8 from the
signature than to read through
Hello Roozbeh
At 04:02 01/05/15 +0430, Roozbeh Pournader wrote:
Well, I received a UTF-8 email from Microsoft's Dr International today. It
was a multipart/alternative, with both the text/plain and text/html
in UTF-8. Well, nothing interesting yet, but the interesting point was
that the HTML
At 11:28 01/04/26 -0700, Markus Scherer wrote:
Paul Deuter wrote:
I am wondering if there isn't a need for the Unicode Spec to also
dictate a way of encoding Unicode in an ASCII stream. Perhaps
How many more ways to we need?
To be 8-bit-friendly, we have UTF-8.
To get everything into ASCII
Hello Paul,
At 19:41 01/04/25 -0700, Paul Deuter wrote:
I am struggling to figure out the correct method for encoding Unicode
characters in the
query string portion of a URL.
There is a W3C spec that says the Unicode character should be converted to
UTF-8 and
then each byte should be encoded as
At 15:02 01/04/26 -0700, Paul Deuter wrote:
Based on the responses, I guess my original question/problem was not
very well written.
The %XX idea does not work because this it already in use by lots of
software
to encode many different character sets. So again we need something that
identifies
Hello Mike,
At 19:09 01/04/26 -0600, Mike Brown wrote:
W3C specifies to use %-encoded UTF-8 for URLs.
I think that's an overstatement.
Neither the W3C nor the IETF make such a specification.
True. Neither W3C nor IETF make such a general statement,
because we can't just remove the about 10
At 09:29 01/04/17 -0500, [EMAIL PROTECTED] wrote:
In a perfect world, we would probably have an enclosing symbol (e.g.
'\4E00') so that the number can be variable length.
tuning into another language channel
In Perl the notation is \x{...}, where ... is hexdigit sequence:
\x{41} is LATIN
Hello Florian - There is no official or coordinated review of IETF
documents. Because of the volunteer nature of the IETF, it mostly
depends on individuals.
I have been in contact with the USEFOR group for a while.
What particular serious problem are you speaking about?
If you know about a
Hello Florian,
Of course, KC/KD-normalization is not sufficient. The problem
already exists in ASCII. I/l/1 and 0/O can easily be confused.
It will always be necessary for people to think a bit when creating
their email addresses,...
On the other hand, when identifiers can be written in various
Ruby Annotation (http://www.w3.org/TR/ruby) and
XHTML(TM) 1.1 - Module-based XHTML (http://www.w3.org/TR/xhtml11)
became W3C Proposed Recommendations on April 6, 2001.
Abstract of 'Ruby Annotation':
"Ruby" are short runs of text alongside the base text, typically used
in East Asian documents
52 matches
Mail list logo