Ahmed created HTTPCLIENT-2395:
---------------------------------

             Summary: Non-ASCII filename corrupted in HTTP request
                 Key: HTTPCLIENT-2395
                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2395
             Project: HttpComponents HttpClient
          Issue Type: Bug
          Components: HttpClient (classic)
    Affects Versions: 5.5
         Environment: Ubuntu 24.04
            Reporter: Ahmed
             Fix For: 5.4.4


Hi team,

I recently upgraded Apache HTTP Client to newest version (5.5) from 5.3.1 and 
one of the tests in my client side service detected an issue. Issue is 
presented while forming HTTP multipart request with attachments/inlines that 
contains non-ascii characters in filename.

Example:

 
{code:java}
val attachment : Part? = mimeMessage.attachments.firstOrNull()
val multipart = MultipartEntityBuilder.create()
multipart.setMode(HttpMultipartMode.EXTENDED)
multipart.addBinaryBody(
  "attachments",
  attachment?.openDataStream()?.use { it.readBytes()},
  ContentType.parse(attachment?.contentType),
  attachment?.name)
.build() 

val httpPost = HttpPost(url())
httpPost.entity = multipart.build()

httpClient.execute(httpPost) { it.handleResponse() }{code}
 

>From given MIME message:

 
{code:java}
Content-Type: multipart/alternative;
 boundary="------------705ZF0wSwOSffEDi6dR6B0hC"
Message-ID: <[email protected]>
From: "🌪️ R@nd0M ユーザー" <[email protected]>
To: "Tēst 🎯 Üser" <[email protected]>
Subject: =?UTF-8?B?Rml4IG1l?=
--------------705ZF0wSwOSffEDi6dR6B0hC
Content-Type: text/html
<p> HTML </p>
--------------705ZF0wSwOSffEDi6dR6B0hC
Content-Type: application/octet-stream; name="ติมเงินผิดเบอร์mPayเ.xlsx"
Content-Disposition: inline; filename="ติมเงินผิดเบอร์mPayเ.xlsx"
Content-Transfer-Encoding: base64
iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+P+/HgAFhAJ/wlseKgAAAABJRU5ErkJggg==
--------------705ZF0wSwOSffEDi6dR6B0hC--
{code}
 

 

 

 

 

This generates HTTP request with following problematic URL encoded part:

 
{code:java}
Content-Disposition: form-data; name="attachments"; 
filename="%F0%9F%90%99_inline-%E5%9B%BE%E5%83%8F_%E6%96%87%E4%BB%B6.png"; 
filename*="UTF-8''UTF-8%27%27%25F0%259F%2590%2599_inline-%25E5%259B%25BE%25E5%2583%258F_%25E6%2596%2587%25E4%25BB%25B6.png"Content-Type:
 image/png  {code}
filename* gets UTF-8 encoded two times resulting in filename with UTF-8'' 
prefix where actual value should be:

 

 
{code:java}
Content-Disposition: form-data; name="attachments"; 
filename="%F0%9F%90%99_inline-%E5%9B%BE%E5%83%8F_%E6%96%87%E4%BB%B6.png"; 
filename*="UTF-8''UTF-8%27%27%25F0%259F%2590%2599_inline-%25E5%259B%25BE%25E5%2583%258F_%25E6%2596%2587%25E4%25BB%25B6.png"Content-Type:
 image/png  {code}
 

I suspect that problem lies 
[here|https://github.com/apache/httpcomponents-client/blob/3eda5098f82c0d5cf1ceaa72afb1c24d9836ff56/httpclient5/src/main/java/org/apache/hc/client5/http/entity/mime/HttpRFC7578Multipart.java#L104],
 where additional UTF-8'' char is appended on filename along with original 
appending while generating multipart itself 
[here|https://github.com/apache/httpcomponents-client/blob/3eda5098f82c0d5cf1ceaa72afb1c24d9836ff56/httpclient5/src/main/java/org/apache/hc/client5/http/entity/mime/FormBodyPartBuilder.java#L164].
 

Problem can be avoided using LEGACY mode which doesn't look as ideal solution 
to me as it doesn't support UTF-8 headers like in From or To MIME headers for 
example.

Related JIRA: https://issues.apache.org/jira/browse/HTTPCLIENT-2360



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to