Kadeem Hassam created HTTPCORE-531:
--------------------------------------
Summary: TokenParser breaks at double quotes when parsing unquoted
values
Key: HTTPCORE-531
URL: https://issues.apache.org/jira/browse/HTTPCORE-531
Project: HttpComponents HttpCore
Issue Type: Bug
Components: HttpCore
Affects Versions: 4.4.9
Reporter: Kadeem Hassam
Assume a query string like {{a=b"c&d=e}}
The expected mapping for that query string, would reasonably be expected to be
{code:java}
[a=b"c, d=e]
{code}
Actual result using httpcore 4.4.9 is
{code:java}
[a=bc&d=e]
{code}
Example code:
{code:java}
import java.nio.charset.StandardCharsets;
import org.apache.http.client.utils.URLEncodedUtils;
class QueryParser {
public static void main(String[] args) {
System.out.println(URLEncodedUtils.parse("a=b\"c&d=e",
StandardCharsets.UTF_8, '&'));
}
}
{code}
Using {{URLEncodedUtils}} from {{httpclient}} uses the {{TokenParser}} in
{{httpcore}}.
After successfully parsing the name ({{a}}), the value is parsed using the
{{parseValue(CharArrayBuffer, ParserCursor,
BitSet)}}[[link|https://github.com/apache/httpcomponents-core/blob/4.4.x/httpcore/src/main/java/org/apache/http/message/TokenParser.java#L119-L144]]
method.
The first character being neither a delimiter nor a double quote, ends up
calling {{copyUnquotedContent(CharArrayBuffer, ParserCursor, BitSet,
StringBuilder)}}[[link|https://github.com/apache/httpcomponents-core/blob/4.4.x/httpcore/src/main/java/org/apache/http/message/TokenParser.java#L205-L221]]
which ends up returning when the double quote is reached
([[link|https://github.com/apache/httpcomponents-core/blob/4.4.x/httpcore/src/main/java/org/apache/http/message/TokenParser.java#L213-L214]])
instead of when the delimiter is reached.
{{parseValue}} then continues parsing the value but as quoted content this time
(because the now current position is a quote character). Copying quoted content
reasonably does not break on the delimiter set, but this ends up consuming the
rest of the query string.
Other URI parsers parse the URI in the expected format, such as with Python.
{noformat}
Python 3.6.1 (default, Mar 23 2017, 13:04:44) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.parse
>>> urllib.parse.parse_qs('a=b"c&d=e')
{'a': ['b"c'], 'd': ['e']}
{noformat}
Although I haven't explicitly tested with {{httpcore5}}, the code for
{{TokenParser}} appears equivalent to {{4.4.9}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]