[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Kalnichevski resolved HTTPCLIENT-1978.
-------------------------------------------
       Resolution: Fixed
    Fix Version/s: 5.0 Beta5
                   4.5.9

Fixed in HttpCore 4.4 and 5.0.

> Unicode header values are converted into mojibake
> -------------------------------------------------
>
>                 Key: HTTPCLIENT-1978
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1978
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient (classic)
>    Affects Versions: 4.5.7, 5.0 Beta3
>            Reporter: Ryan Schmitt
>            Priority: Major
>             Fix For: 4.5.9, 5.0 Beta5
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Unicode handling is badly broken, as the below examples show:
> {{httpget.addHeader("X-I-Expect-This-Header", "Федор Достоевский")}} => 
> {{X-I-Expect-This-Header: $54>@ >AB>52A:89}}
> {{httpget.addHeader("X-I-Expect-This-Header", "宮本茂")}} => 
> {{X-I-Expect-This-Header: �,}}
> {{httpget.addHeader("X-I-Expect-This-Header", "Ἀριστοτέλης")}} => 
> {{X-I-Expect-This-Header:���Ŀĭ���}}
> The root cause is 
> [here|https://github.com/apache/httpcomponents-core/blob/589fe21a0bd3481431f08d296fff1e323a8f497d/httpcore5/src/main/java/org/apache/hc/core5/util/ByteArrayBuffer.java#L138-L140]:
> {code:java}
>         for (int i1 = off, i2 = oldlen; i2 < newlen; i1++, i2++) {
>             this.array[i2] = (byte) b[i1];
>         }
> {code}
> In this code, {{b}} is of type {{char[]}} and {{array}} is of type 
> {{byte[]}}. According to [JLS § 
> 5.1.3|https://docs.oracle.com/javase/specs/jls/se11/html/jls-5.html#jls-5.1.3]
>  ("Narrowing Primitive Conversion"), "[a] narrowing conversion of a {{char}} 
> to an integral type T likewise simply discards all but the _n_ lowest order 
> bits, where _n_ is the number of bits used to represent type T."
> There are a few ways we could fix this, and any of them would be better than 
> what we are doing now. The two I'll propose for consideration are:
> # Just write UTF-8 to the wire; non-ASCII characters should be tolerated as 
> {{obs-text}}
> # Replace non-ASCII characters with an empty string, space, or question mark
> See also: https://issues.apache.org/jira/browse/HTTPCLIENT-1974



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to