[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ortwin Glück resolved HTTPCLIENT-642.
-------------------------------------

    Resolution: Invalid

Ralf,

You have a form that accepts ISO-8859-1. But then your user enters a chinese 
character which CAN NOT be represented in that encoding. That's why your 
browser resorts to representation as an HTML entity: "能" which it 
additionally HTML-encodes to "能". I know of no standard that 
describes this behviour. It looks completely arbitrary. As a matter of fact, 
entering of non-ISO characters in such a form is not allowed and the result is 
not well-defined. If you need chinese characters use a UTF-8 capable form.

Now, in your example the url variable contains a # character which is NOT used 
as a separater for the reference (anchor) part of the URL. The # character is a 
reserved URI character for that purpose. So it must  be escaped when used 
inside a GET parameter value like in this example. This means that this URL is 
not properly escaped. The URI class' behaviour is correct.

Ortwin

> browser encoded UTF-8 character gets truncated by URI upon escaping
> -------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-642
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-642
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>    Affects Versions: 3.0.1
>            Reporter: Ralf Hauser
>
> a mozilla 1.5.0.10 get request of an iso-8859-1 form where a user 
> inadvertently entered a chinese character arrives at my tomcat like 
> String url=  "/hp/index.php?address=addr&[EMAIL 
> PROTECTED]&name=Ralf能 GMX&subject=Newsletter"
> the chinese charcter  能 being encoded as 能
>                                 URI uri = new URI(url, false, "ISO-8859-1");
>                               GetMethod httpGet = new 
> GetMethod(uri.getEscapedURI());
>                               log.debug(httpGet.getURI());
>   "/hp/index.php?address=addr&[EMAIL PROTECTED]&name=Ralf&"
> How should I deal with that until the v4 is out? Will that no longer happen 
> there?
>     
> see also HTTPCLIENT-577

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to