[
https://issues.apache.org/jira/browse/METAMODEL-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293418#comment-15293418
]
ASF GitHub Bot commented on METAMODEL-1086:
-------------------------------------------
GitHub user bmehner opened a pull request:
https://github.com/apache/metamodel/pull/104
Fix for encoding error when using InputStreams
Fixes https://issues.apache.org/jira/browse/METAMODEL-1086
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/bmehner/metamodel master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/metamodel/pull/104.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #104
----
commit 6219cfa6e6ac4bc4e9ce5ab5ceca7abcd3d3b778
Author: Björn Mehner <[email protected]>
Date: 2016-05-20T14:04:12Z
Fix for encoding error when using InputStreams
----
> Encoding not used with InputStreams in CsvDataContext
> -----------------------------------------------------
>
> Key: METAMODEL-1086
> URL: https://issues.apache.org/jira/browse/METAMODEL-1086
> Project: Apache MetaModel
> Issue Type: Bug
> Affects Versions: 4.5.2
> Reporter: Samuel Mumm
>
> When using the Constructor with InputStreams you can get into trouble with
> encoding if the default encoding of your platform is different than the one
> used in the InputStream even though you specify an encoding in the
> CvsConfiguration.
> {code}
> CsvDataContext csvDataContext = new CsvDataContext(someInputstream, new
> CsvConfiguration(1, "utf-8", ';', '"', '\\'));
> {code}
> The offending code is in the static method createFileFromInputStream():
> {code}
> private static File createFileFromInputStream(InputStream inputStream, String
> encoding) {
> ....
> final BufferedWriter writer = FileHelper.getBufferedWriter(file,
> encoding);
> final BufferedReader reader = new BufferedReader(new
> InputStreamReader(inputStream));
> ....
> {code}
> The InputStreamReader is instantiated without a charset. In this case the
> Platforms default charset is used (e.g. "windows-1252"). The BufferedWriter
> on the other hand is instantiated with the specified charset. This
> effectively causes a re-encoding if the file is in a different encoding (e.g.
> "utf-8") than the platforms default encoding when the content of the stream
> is written to the temp directory.
> Instead the code should be similar to this:
> {code}
> private static File createFileFromInputStream(InputStream inputStream, String
> encoding) {
> ....
> final BufferedWriter writer = FileHelper.getBufferedWriter(file,
> encoding);
> final BufferedReader reader = new BufferedReader(new
> InputStreamReader(inputStream, encoding));
> ....
> {code}
> On the other hand you can skip the encoding completely when copying the
> InputStream. The encoding is used later when the FileResource is read. An
> alternative and more readable implementation in Java 7 would be:
> {code}
> tempFile = File.createTempFile("metamodel", ".csv");
> tempFile.deleteOnExit();
> Files.copy(resourceAsStream, tempFile.toPath(),
> StandardCopyOption.REPLACE_EXISTING);
> return tempfile;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)