[
https://issues.apache.org/jira/browse/CAMEL-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826717#comment-16826717
]
MykhailoVlakh commented on CAMEL-8356:
--------------------------------------
Hello [~njiang],
I have a question regarding this issue and the way it was fixed. It seems like
there is a defect or maybe I do not understand something, maybe you can help me?
The issue is the following, I have a file consumer that I use to process XML
files with UTF-8 charset. In my processor I am taking content of an incoming
file like this:
{code:java}
exchange.getIn().getBody(InputStream.class){code}
and then I am passing it to XML parser together with configured charset (UTF-8)
to make sure that reader, that is used by XML parser, will consume binary
stream correctly.
The issue is that if default system charset is not UTF-8 (valid case for my
application) I am getting question marks instead of Chinese characters.
After some debugging I found a strange thing, the converted, which is used to
give me file body as InputStream,
org.apache.camel.converter.IOConverter.toInputStream(File, String) does
something strange, it reads characters from the file and them encodes them into
system's default charset. This blew my mind, this looks wrong.
When I take body as InputStream I am expecting to get a binary stream, not a
re-encoded characters stream. The current behavior seems totally unexpected to
me.
Thank you for your answer in advance.
> IOConverter.toInputStream(file, charset) returns strange behaving stream
> ------------------------------------------------------------------------
>
> Key: CAMEL-8356
> URL: https://issues.apache.org/jira/browse/CAMEL-8356
> Project: Camel
> Issue Type: Bug
> Components: camel-core
> Affects Versions: 2.14.1, 2.15.0
> Reporter: Stefan Mandel
> Assignee: Willem Jiang
> Priority: Major
> Fix For: 2.14.2, 2.15.0
>
> Attachments:
> CAMEL8356-repaired-Test-and-adjusted-converter-imple.patch,
> IOConverterCharsetTest.java, german.iso-8859-1.txt, german.utf-8.txt,
> result.txt, source.txt
>
>
> Calling IOConverter.toInputStream with either UTF-8 or ISO-8859-1 returns a
> stream that behaves strange on non-ascii-characters:
> - putting this stream into an InputStreamReader will return false encoded
> characters
> - a naive new BufferedReader(new InputStreamReader(new FileInputStream(file),
> charset)) will return the correctly encoded characters.
> I will attach some unit tests for this case.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)