[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535698#comment-16535698
 ] 

Oleg Kalnichevski commented on HTTPCLIENT-1927:
-----------------------------------------------

[~kadeem.hassam_unbounce] {{TokenParser#parseValue}} is absolutely correct in 
handling unescaped double quote in token value. However 
{{URLEncodedUtils#parse}} should be using {{TokenParser#parseToken}} instead of 
{{TokenParser#parseValue}} as URL encoding has a different character escaping 
scheme.

I'll commit this fix once 4.5.6 RC1 release vote has been closed.

https://github.com/ok2c/httpclient/commit/5a546db63c00fc6b5f5f6f3b780577663e2edb46

Oleg

> URLEncodedUtils#parse breaks at double quotes when parsing unquoted values
> --------------------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1927
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1927
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient (async), HttpClient (classic)
>    Affects Versions: 4.5.5, 4.5.6
>            Reporter: Kadeem Hassam
>            Priority: Minor
>
> Assume a query string like {{a=b"c&d=e}}
> The expected mapping for that query string, would reasonably be expected to be
> {code:java}
> [a=b"c, d=e]
> {code}
> Actual result using httpcore 4.4.9 is
> {code:java}
> [a=bc&d=e]
> {code}
> Example code:
> {code:java}
> import java.nio.charset.StandardCharsets;
> import org.apache.http.client.utils.URLEncodedUtils;
> class QueryParser {
>     public static void main(String[] args) {
>         System.out.println(URLEncodedUtils.parse("a=b\"c&d=e", 
> StandardCharsets.UTF_8, '&'));
>     }
> }
> {code}
> Using {{URLEncodedUtils}} from {{httpclient}} uses the {{TokenParser}} in 
> {{httpcore}}.
> After successfully parsing the name ({{a}}), the value is parsed using the 
> {{parseValue(CharArrayBuffer, ParserCursor, 
> BitSet)}}[[link|https://github.com/apache/httpcomponents-core/blob/4.4.x/httpcore/src/main/java/org/apache/http/message/TokenParser.java#L119-L144]]
>  method.
> The first character being neither a delimiter nor a double quote, ends up 
> calling {{copyUnquotedContent(CharArrayBuffer, ParserCursor, BitSet, 
> StringBuilder)}}[[link|https://github.com/apache/httpcomponents-core/blob/4.4.x/httpcore/src/main/java/org/apache/http/message/TokenParser.java#L205-L221]]
>  which ends up returning when the double quote is reached 
> ([[link|https://github.com/apache/httpcomponents-core/blob/4.4.x/httpcore/src/main/java/org/apache/http/message/TokenParser.java#L213-L214]])
>  instead of when the delimiter is reached.
> {{parseValue}} then continues parsing the value but as quoted content this 
> time (because the now current position is a quote character). Copying quoted 
> content reasonably does not break on the delimiter set, but this ends up 
> consuming the rest of the query string.
> Other URI parsers parse the URI in the expected format, such as with Python.
> {noformat}
> Python 3.6.1 (default, Mar 23 2017, 13:04:44) [GCC] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import urllib.parse
> >>> urllib.parse.parse_qs('a=b"c&d=e')
> {'a': ['b"c'], 'd': ['e']}
> {noformat}
> Although I haven't explicitly tested with {{httpcore5}}, the code for 
> {{TokenParser}} appears equivalent to {{4.4.9}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to