https://issues.apache.org/bugzilla/show_bug.cgi?id=48899

           Summary: Guess URI charset should solve lot of problems
           Product: Tomcat 6
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Connectors
        AssignedTo: dev@tomcat.apache.org
        ReportedBy: mich...@wyraz.de


Hi tomcat connector developers,

tomcat's connectors have some options to either set the uri encoding to a
certain charset or to use the body's encoding (set by
request.setCharacterEncoding) to decode the uri (if none is set, iso-8859-1 is
used).
I found an article (+code) the demonstrates that the charset of data can easily
guessed at http://glaforge.free.fr/wiki/index.php?wiki=GuessEncoding . Since
the most common charsets for uri encoding are iso-8859-1 (since it's default
for uri encoding) and utf-8 (because it's used for most multi language
websites), the possible choices are very clear.

So I'd suggest to add an option to the connectors to guess the used charset for
decoding of uri parts. There are so many issues with uri decoding that would be
solved that way (e.g. when the uri is decoded as utf-8 and a user types an
umlaut into the address bar, the browser might encode it as iso-8859-1 and the
app has no way to fix this).

Regards, Michael.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to