What character encoding would you have the server use to
decode the headers? :)

If the server has to make a character encoding assumption
about something, why not the URL.  IMHO, it would be backwards
to require the server look at something "internal", i.e. the
headers, in order to figure out what to do with the parent,
i.e. the request and its URL.

Since the RFCs don't require servers to handle more than ASCII,
at least for HTTP URLs, as far as I can recall.  Going beyond
ASCII in the URL, is going beyond guaranteed behavior.  Until
the RFCs are updated, this won't change.  Fortunately, the
server doesn't need to mess with the URL query string during its
portion of the processing.  Thus, slipping non-ASCII characters
into the query string is a lot less likely to have problems.

In Java land, with Unicode based strings, a Java based web
server using UTF-8 has a good change of doing what you want.
The RFCs address a more than just Java based web servers, so
they aren't likely to be changed or updated just because
something is easy in Java.

Note that the character encoding of servlet mappings in the
web.xml aren't explicitly covered in the specs.  You would be
on shakier ground if you wanted to use non-ASCII characters
in the url-pattern of a servlet mapping.

Just my 2 cents.

Cheers,
Larry

> -----Original Message-----
> From: Edward Toro [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, March 11, 2004 1:47 PM
> To: Tomcat Users List
> Subject: RE: international filenames inaccessible
> 
> 
> It still seems incorrect for the server to decide which type 
> of encoding to use.  To support the portability of webapps, 
> shouldn't each webapp decide its own encoding?  Otherwise, 
> once "URIEncoding=UTF-8" is set, every webapp on the server 
> has to send international characters in UTF-8.  Instead, each 
> webapp should specify the encoding it wants to use in a header.
> 
> So the worthwhile change would be, as Yan said, to default 
> the useBodyEncodingForURI to true.  But if that only applies 
> to the query string, then it only solves part of the problem.
> 
> -ET
> 
> -----Original Message-----
> From: Larry Isaacs [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 11, 2004 1:23 PM
> To: Tomcat Users List
> Subject: RE: international filenames inaccessible
> 
> 
> This has been discussed on tomcat-dev pretty thoroughly
> already.  Tomcat 4.1.27 and earlier were hard coded to
> use UTF-8 for decoding URLs.  This allowed you to easily
> develop a dependency on this "feature" and then later
> discover your webapp isn't portable.  Tomcat 4.1.30 and
> 5.0.19 fix this by forcing you to change the default,
> which supports portability, to something that does not.
> Hence, no surprises with respect to portability.
> 
> Note that URL query string encoding is affected by the
> useBodyEncodingForURI attribute.  Tomcat 4.1.30 defaults
> this to true, to maintain the same behavior as prior
> Tomcat 4.1.x versions. In Tomcat 5.0.19 it defaults to
> false.  If you try to serve some webapps that aren't
> using UTF-8 everywhere, you could be impacted by this.
> 
> Cheers,
> Larry
> 
> > -----Original Message-----
> > From: Edward Toro [mailto:[EMAIL PROTECTED] 
> > Sent: Thursday, March 11, 2004 12:58 PM
> > To: Tomcat Users List
> > Subject: RE: international filenames inaccessible
> > 
> > 
> > Wow, that worked!
> > 
> > The problem may actually be in Java rather than Tomcat.  I 
> > set the DEBUG value to 1001 on a 5 server and a 4.1.18 server 
> > to check the request info.  The call to getServletPath() 
> > returns a different value between 4.1.18 and the latest 
> > releases.  I suppose previously Java did the decoding, but 
> > now the servlet is responsible for the decoding?  Or maybe 
> > the newer servers specify ISO-8859-1 instead of letting Java 
> > do the work?
> > 
> > It's really annoying that this value overrides the use of the 
> > "file.encoding" System property.  A previous "solution" 
> > mentioned using that, but I couldn't get it to work.
> > 
> > IMO, the server should be able to serve files with 
> > international file names without any extra configuration, 
> > especially since it used to do it before.  UTF-8 is becoming 
> > the standard for international character transmission over 
> > the net, if it's not the standard already.  And UTF-8 looks 
> > exactly like ASCII for all the values in the ASCII range.  Is 
> > this something worth bringing up in the Tomcat-Dev group?
> > 
> > -ET
> > 
> > -----Original Message-----
> > From: Larry Isaacs [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, March 11, 2004 12:36 PM
> > To: Tomcat Users List
> > Subject: RE: international filenames inaccessible
> > 
> > 
> > See the "uriEncoding" attribute described at:
> > 
> > http://jakarta.apache.org/tomcat/tomcat-5.0-doc/config/http.html
> > 
> > The same attribute applies to Tomcat 4.1.30 as well.
> > 
> > I'm not aware of any specs that guarantee behavior when using
> > non-ASCII characters in the URL in this fashion, but it might
> > work.
> > 
> > Cheers,
> > Larry
> > 
> > > -----Original Message-----
> > > From: Edward Toro [mailto:[EMAIL PROTECTED] 
> > > Sent: Thursday, March 11, 2004 11:10 AM
> > > To: Tomcat Users List
> > > Subject: international filenames inaccessible
> > > 
> > > 
> > > Does anyone know if Tomcat 5 is supposed to serve files with 
> > > international characters in their filenames?  It used to work 
> > > in Tomcat 4.1.24, but stopped working in 4.1.30 and doesn't 
> > > work in 5.0.19.
> > > 
> > > In all the versions of Tomcat I've seen, the international 
> > > characters are converted using URLEncoder(filename, "UTF-8") 
> > > as per the standard at 
> > > http://www.w3.org/International/O-URL-> code.html.  But the 
> > > broken servers return 404 when you try 
> > > to access international filenames like that.
> > > 
> > > The code to interpret the encoding is provided on that w3.org 
> > > page.  Why isn't it part of the server anymore?
> > > 
> > > -Ed
> > > 
> > > 
> > > 
> > 
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: 
> [EMAIL PROTECTED]
> > > 
> > > 
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> > 
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to