[
https://issues.apache.org/jira/browse/HTTPCLIENT-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486817#comment-13486817
]
Thibaut commented on HTTPCLIENT-1257:
-------------------------------------
It also fails when you request the encoded url, which is the one which is
transfered in both variants over the wire.
http://handheld.vn/content.php?4052-%C4%90%C3%A1nh-gi%C3%A1-m%C3%A1y-t%C3%ADnh-b%E1%BA%A3ng-Kindle-Fire-HD-7-inch
2012-10-30 12:26:20,859 DEBUG http.wire: >> "GET
/content.php?4052-%C4%90%C3%A1nh-gi%C3%A1-m%C3%A1y-t%C3%ADnh-b%E1%BA%A3ng-Kindle-Fire-HD-7-inch
HTTP/1.1[\r][\n]" [main]
2012-10-30 12:26:20,860 DEBUG http.wire: >> "Accept-Charset:
ISO-8859-1,utf-8;q=0.7,*;q=0.7[\r][\n]" [main]
2012-10-30 12:26:20,860 DEBUG http.wire: >> "Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8[\r][\n]" [main]
2012-10-30 12:26:20,860 DEBUG http.wire: >> "Accept-Language:
en-gb,en;q=0.5[\r][\n]" [main]
....
> Header location automatically converted to ASCII even though location can
> contain UTF-8 encoded urls
> ----------------------------------------------------------------------------------------------------
>
> Key: HTTPCLIENT-1257
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1257
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient
> Affects Versions: 4.2.2
> Reporter: Thibaut
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> I'm trying to fetch:
> http://handheld.vn/content.php?4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch
> Which returns:
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "HTTP/1.1 303 See Other[\r][\n]"
> [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Date: Mon, 29 Oct 2012 17:55:57
> GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Server: Apache[\r][\n]" [main]
> 2012-10-29 18:54:29,355 DEBUG http.wire: << "Expires: Thu, 19 Nov 1981
> 08:52:00 GMT[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Cache-Control: no-store,
> no-cache, must-revalidate, post-check=0, pre-check=0[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Pragma: no-cache[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Set-Cookie: bb_lastactivity=0;
> expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/[\r][\n]" [main]
> 2012-10-29 18:54:29,356 DEBUG http.wire: << "Location:
> http://handheld.vn/content/4052-????nh-gi??-m??y-t??nh-b???ng-Kindle-Fire-HD-7-inch[\r][\n]"
> [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Length: 0[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Connection: close[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Type: text/html[\r][\n]"
> [main]
> 2012-10-29 18:54:29,357 DEBUG http.wire: << "[\r][\n]" [main]
> 2012-10-29 18:54:29,357 DEBUG conn.DefaultClientConnection: Receiving
> response: HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,357 DEBUG http.headers: << HTTP/1.1 303 See Other [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Date: Mon, 29 Oct 2012
> 17:55:57 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Server: Apache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Expires: Thu, 19 Nov 1981
> 08:52:00 GMT [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Cache-Control: no-store,
> no-cache, must-revalidate, post-check=0, pre-check=0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Pragma: no-cache [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Set-Cookie: bb_lastactivity=0;
> expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/ [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Location:
> http://handheld.vn/content/4052-Äánh-giá-máy-tÃnh-bảng-Kindle-Fire-HD-7-inch
> [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Content-Length: 0 [main]
> 2012-10-29 18:54:29,358 DEBUG http.headers: << Connection: close [main]
> 2012-10-29 18:54:29,359 DEBUG http.headers: << Content-Type: text/html [main]
> Unfortunately I can't get the resolve Url through the following code:
> Header locationHeader = response.getFirstHeader("location");
> which will return
> http://handheld.vn/content/4052-Äánh-giá-máy-tÃnh-bảng-Kindle-Fire-HD-7-inch
> The header has already been extracted in the wrong content encoding. I will
> never be able to get the redirect url!
> I understand that this is not RFC normalised behavior, but the above url and
> redirect works fine in all browsers.
> Is it possible to access the raw header (byte array) so that I can chose the
> encoding on my own? This would help a lot. Or a parameter to optionally
> specify the encoding when fetching a header value.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]