Hello Florian,
You are right that currently, there is no well-defined way to
include arbitrary characters into URIs, or to interpret URIs
and find out which characters they contain. So if you have
a file with an a-umlaut and an Euro sign in it, to construct
an http URI for it, you have to make sure you know the
encoding that the server exposes. This may be the same
encoding as the one that is actually used in the file
system itself (in many cases), or it may be not.
Efforts are going on to make sure we can improve on the
current state. You can find an overview, including the
document James already mentioned in another mail, at
http://www.w3.org/International/O-URL-and-ident.html.
If you want to make sure you stay in sync with this,
and will be able to enjoy the benefits of the effort
going on, you should set up your server so that it
exposes file names as UTF-8.
Regards, Martin.
At 00/08/01 14:06 +0200, Florian Gro$B!,(Be-Coosmann wrote:
>Hey,
>
>Maybe, I'm off topic but I have a question about RFC 2616 (HTTP 1.1).
>Used URLs e.g. in PUT or GET methods may include non US_ASCII
>characters. RFC 2616 directs the problem to RFC 2396 (URIs) which
>claims that only some characters should be printed as is and
>others should be escaped by "%" HEXNIBBLE1 HEXNIBBLE2.
>
>Furthermore, RFC 2396 directs the problem of the default target
>codepage back to the application of RFC 2396, RFC 2616 in this
>case.
>
>Does anybody know the default codepage in URIs of HTTP?
>
>To figure out the problem:
>Imagine some files with different foreign characters, e.g.
>German umlaut a (HTML auml, Unicode 228, represented in Latin1) and
>the Euro sign (HTML euro, Unicode 8364, not represented in Latin1).
>What happens if a file name including one or both of this
>characters are included?
>The RFC 2396 conforming name requires the usage of "%". But of
>what character set? ISO 10646 (Unicode) can't be used because
>of the length restriction in "%". Latin1 can't be used because
>it doesn't contain the Euro sign. UTF8 or other MBCS can
>convert all characters to RFC 2396 comforming characters but
>this isn't mentioned in RFC 2616.
>
>What is the appropriate way of handling special characters
>and what do other foreign people with much different
>characters like chinese, thai, etc?
>
>Thanks, Florian
>
>-
>This message was passed through [EMAIL PROTECTED], which
>is a sublist of [EMAIL PROTECTED] Not all messages are passed.
>Decisions on what to pass are made solely by Harald Alvestrand.
- Question about the character set in HTTP-URLs Florian Gro�e-Coosmann
- Re: Question about the character set in HTTP-U... Martin J. Duerst
- Re: Question about the character set in HTTP-U... James P. Salsman
