On 12/9/2013 7:43 PM, Oleg Kalnichevski wrote:
On Mon, 2013-12-09 at 19:15 +0530, Dhruvakumar P G wrote:
On 12/9/2013 4:41 PM, Oleg Kalnichevski wrote:
On Mon, 2013-12-09 at 13:09 +0530, Dhruvakumar P G wrote:
Hello,

I'm in the middle of upgrading Httpclient, mime, core libraries to
latest version. I haven't been able to figure out any solution to the
following problem.
When Httpclient downloads a text file(icité Àâqë-withmultibytechars.txt)
which contains multibyte characters from another server and sends it to
the browser.
*The server returns the response headers as below :*

HTTP/1.1 200 OK
X-Powered-By: Servlet/2.5
Content-Disposition: attachment;       filename="icité
Àâqë-withmultibytechars.txt"
Content-Type: application/octet-stream
Content-Length: 162
*
**Browser receives the headers as below and shows the filename rightly :*

Content-Disposition    attachment; filename="icité
Àâqë-withmultibytechars.txt"
Content-Type    application/octet-stream
Transfer-Encoding    chunked

When Httpclient downloads an image file(ウェ.jpg) from another server
and sends it to the browser.
*The server returns the response headers as below : *
HTTP/1.1 200 OK
X-Powered-By: Servlet/2.5
Content-Disposition: attachment; filename="ウェ.jpg"
Content-Encoding: gzip
Content-Type: application/octet-stream
Transfer-Encoding: chunked

Even though  "Content-Encoding: gzip" header is returned by the server,
the response object doesn't have this header.
Somehow this header has been removed from the response when the request
gets executed,  _response = _httpClient.execute(_httpHost, _httpMethod,
_httpContext);

*Browser will not receive this header, non-ascii characters aren't
recognized in the filename of download dialogue, it just shows empty
characters:*
Content-Disposition    attachment; filename="   .jpg"
Content-Type    application/octet-stream
Transfer-Encoding    chunked, chunked

Am I missing something here ? How do I make sure that the Httpclient
doesn't ignore this header and browser get to show the filename rightly ?

HTTP message headers may not have non-ASCII per requirements of the HTTP
protocol. The target server is in violation of the HTTP specification.
Yes indeed,  the target server should return encoded filename :
*Content-disposition: attachment; filename="=?utf-8?B?44Km44KnLmpwZw==?="*
But instead it is returning unencoded filename : Content-Disposition:
attachment; filename="ウェ.jpg"
Can't I resolve my issue unless target server returns encoded filename ?

Thanks,
Dhruva
One can force HttpClient, though, to use a non-standard charset for HTTP
messages by using a custom ConnectionConfig.

Oleg

I have set the charset to UTF-8,
connectionConfigBuilder.setCharset(Consts.UTF_8)
Will Setting charset to any other make httpclient to not to lose
'Content-Encoding' response header ?

I am not aware of a single confirmed case of HttpClient losing headers.
You can use wire / context logging to see what data packets are
transmitted across the wire.

Oleg
Hello,
To narrow down the problem, I have disabled the compression in target server. Now the target server doesn't return Content-Encoding header.

Given that the target server always returns Non-ASCII filename without being encoded in MIME header(Content-Disposition: attachment; filename=" ウェ.jpg") which is a violation to the HTTP specification. My requirement here is to show the multibyte character file name when user downloads the attachment across all the browsers without losing any character in the name.

With earlier version of HttpClient(4.0.1), when target server returns the non-ascii filename without being encoded as below :
Content-Disposition: attachment; filename="ウェ - multibyte.txt"
Content-Type: text/plain;charset=utf-8

Filename will be kind of encoded in the response of HttpClient(4.0.1) as below :
Content-Disposition    attachment; filename="ウェ - multibyte.txt"
Content-Type    text/plain;charset=utf-8

And as a result of the above behaviour, browser is able to decode the filename and show correctly in the download dialogue.

But in the response of HttpClient(4.3.1), filename will be exactly same as what we got from target server. Not changed into any encoded form unlike in HttpClient(4.0.1) :
Content-Disposition: attachment; filename="ウェ - multibyte.txt"
Content-Type: text/plain;charset=utf-8

And as a result of the above behaviour, browser is not able to show the filename rightly. Download dialogue shows *'- multibyte.txt*' and
response headers in Firebug shows:
Content-Disposition     |attachment; filename=" - multibyte.txt"|
Content-Type    |text/plain;charset=utf-8|



Is the above change-in-behaviour from 4.0 to 4.3 expected ?
If so,*How do I make sure that the multibyte character filename is displayed correctly across all the browsers given that the target server always returns it in unencoded form* ?


Thanks & Regards,
Dhruva



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to