[ 
https://issues.apache.org/jira/browse/CAMEL-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826717#comment-16826717
 ] 

MykhailoVlakh commented on CAMEL-8356:
--------------------------------------

Hello [~njiang],

I have a question regarding this issue and the way it was fixed. It seems like 
there is a defect or maybe I do not understand something, maybe you can help me?

The issue is the following, I have a file consumer that I use to process XML 
files with UTF-8 charset. In my processor I am taking content of an incoming 
file like this:
{code:java}
exchange.getIn().getBody(InputStream.class){code}
and then I am passing it to XML parser together with configured charset (UTF-8) 
to make sure that reader, that is used by XML parser, will consume binary 
stream correctly. 

The issue is that if default system charset is not UTF-8 (valid case for my 
application) I am getting question marks instead of Chinese characters. 

After some debugging I found a strange thing, the converted, which is used to 
give me file body as InputStream, 
org.apache.camel.converter.IOConverter.toInputStream(File, String) does 
something strange, it reads characters from the file and them encodes them into 
system's default charset. This blew my mind, this looks wrong.

When I take body as InputStream I am expecting to get a binary stream, not a 
re-encoded characters stream. The current behavior seems totally unexpected to 
me.

Thank you for your answer in advance. 

 

> IOConverter.toInputStream(file, charset) returns strange behaving stream
> ------------------------------------------------------------------------
>
>                 Key: CAMEL-8356
>                 URL: https://issues.apache.org/jira/browse/CAMEL-8356
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-core
>    Affects Versions: 2.14.1, 2.15.0
>            Reporter: Stefan Mandel
>            Assignee: Willem Jiang
>            Priority: Major
>             Fix For: 2.14.2, 2.15.0
>
>         Attachments: 
> CAMEL8356-repaired-Test-and-adjusted-converter-imple.patch, 
> IOConverterCharsetTest.java, german.iso-8859-1.txt, german.utf-8.txt, 
> result.txt, source.txt
>
>
> Calling IOConverter.toInputStream with either UTF-8 or ISO-8859-1 returns a 
> stream that behaves strange on non-ascii-characters:
> - putting this stream into an InputStreamReader will return false encoded 
> characters
> - a naive new BufferedReader(new InputStreamReader(new FileInputStream(file), 
> charset)) will return the correctly encoded characters.
> I will attach some unit tests for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to