Ahmed created HTTPCLIENT-2395:
---------------------------------
Summary: Non-ASCII filename corrupted in HTTP request
Key: HTTPCLIENT-2395
URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2395
Project: HttpComponents HttpClient
Issue Type: Bug
Components: HttpClient (classic)
Affects Versions: 5.5
Environment: Ubuntu 24.04
Reporter: Ahmed
Fix For: 5.4.4
Hi team,
I recently upgraded Apache HTTP Client to newest version (5.5) from 5.3.1 and
one of the tests in my client side service detected an issue. Issue is
presented while forming HTTP multipart request with attachments/inlines that
contains non-ascii characters in filename.
Example:
{code:java}
val attachment : Part? = mimeMessage.attachments.firstOrNull()
val multipart = MultipartEntityBuilder.create()
multipart.setMode(HttpMultipartMode.EXTENDED)
multipart.addBinaryBody(
"attachments",
attachment?.openDataStream()?.use { it.readBytes()},
ContentType.parse(attachment?.contentType),
attachment?.name)
.build()
val httpPost = HttpPost(url())
httpPost.entity = multipart.build()
httpClient.execute(httpPost) { it.handleResponse() }{code}
>From given MIME message:
{code:java}
Content-Type: multipart/alternative;
boundary="------------705ZF0wSwOSffEDi6dR6B0hC"
Message-ID: <[email protected]>
From: "🌪️ R@nd0M ユーザー" <[email protected]>
To: "Tēst 🎯 Üser" <[email protected]>
Subject: =?UTF-8?B?Rml4IG1l?=
--------------705ZF0wSwOSffEDi6dR6B0hC
Content-Type: text/html
<p> HTML </p>
--------------705ZF0wSwOSffEDi6dR6B0hC
Content-Type: application/octet-stream; name="ติมเงินผิดเบอร์mPayเ.xlsx"
Content-Disposition: inline; filename="ติมเงินผิดเบอร์mPayเ.xlsx"
Content-Transfer-Encoding: base64
iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+P+/HgAFhAJ/wlseKgAAAABJRU5ErkJggg==
--------------705ZF0wSwOSffEDi6dR6B0hC--
{code}
This generates HTTP request with following problematic URL encoded part:
{code:java}
Content-Disposition: form-data; name="attachments";
filename="%F0%9F%90%99_inline-%E5%9B%BE%E5%83%8F_%E6%96%87%E4%BB%B6.png";
filename*="UTF-8''UTF-8%27%27%25F0%259F%2590%2599_inline-%25E5%259B%25BE%25E5%2583%258F_%25E6%2596%2587%25E4%25BB%25B6.png"Content-Type:
image/png {code}
filename* gets UTF-8 encoded two times resulting in filename with UTF-8''
prefix where actual value should be:
{code:java}
Content-Disposition: form-data; name="attachments";
filename="%F0%9F%90%99_inline-%E5%9B%BE%E5%83%8F_%E6%96%87%E4%BB%B6.png";
filename*="UTF-8''UTF-8%27%27%25F0%259F%2590%2599_inline-%25E5%259B%25BE%25E5%2583%258F_%25E6%2596%2587%25E4%25BB%25B6.png"Content-Type:
image/png {code}
I suspect that problem lies
[here|https://github.com/apache/httpcomponents-client/blob/3eda5098f82c0d5cf1ceaa72afb1c24d9836ff56/httpclient5/src/main/java/org/apache/hc/client5/http/entity/mime/HttpRFC7578Multipart.java#L104],
where additional UTF-8'' char is appended on filename along with original
appending while generating multipart itself
[here|https://github.com/apache/httpcomponents-client/blob/3eda5098f82c0d5cf1ceaa72afb1c24d9836ff56/httpclient5/src/main/java/org/apache/hc/client5/http/entity/mime/FormBodyPartBuilder.java#L164].
Problem can be avoided using LEGACY mode which doesn't look as ideal solution
to me as it doesn't support UTF-8 headers like in From or To MIME headers for
example.
Related JIRA: https://issues.apache.org/jira/browse/HTTPCLIENT-2360
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]