[jira] [Commented] (METAMODEL-1086) Encoding not used with InputStreams in CsvDataContext

ASF GitHub Bot (JIRA) Wed, 08 Jun 2016 21:21:42 -0700

    [ 
https://issues.apache.org/jira/browse/METAMODEL-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321854#comment-15321854
 ]


ASF GitHub Bot commented on METAMODEL-1086:
-------------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/metamodel/pull/104


> Encoding not used with InputStreams in CsvDataContext
> -----------------------------------------------------
>
>                 Key: METAMODEL-1086
>                 URL: https://issues.apache.org/jira/browse/METAMODEL-1086
>             Project: Apache MetaModel
>          Issue Type: Bug
>    Affects Versions: 4.5.2
>            Reporter: Samuel Mumm
>
> When using the Constructor with InputStreams you can get into trouble with 
> encoding if the default encoding of your platform is different than the one 
> used in the InputStream even though you specify an encoding in the 
> CvsConfiguration.
> {code}
> CsvDataContext csvDataContext = new CsvDataContext(someInputstream, new 
> CsvConfiguration(1, "utf-8", ';', '"', '\\'));
> {code}
> The offending code is in the static method createFileFromInputStream():
> {code}
> private static File createFileFromInputStream(InputStream inputStream, String 
> encoding) {
>         ....
>         final BufferedWriter writer = FileHelper.getBufferedWriter(file, 
> encoding);
>         final BufferedReader reader = new BufferedReader(new 
> InputStreamReader(inputStream));
>         ....
> {code}
> The InputStreamReader is instantiated without a charset. In this case the 
> Platforms default charset is used (e.g. "windows-1252"). The BufferedWriter 
> on the other hand is instantiated with the specified charset. This 
> effectively causes a re-encoding if the file is in a different encoding (e.g. 
> "utf-8") than the platforms default encoding when the content of the stream 
> is written to the temp directory. 
> Instead the code should be similar to this: 
> {code}
> private static File createFileFromInputStream(InputStream inputStream, String 
> encoding) {
>         ....
>         final BufferedWriter writer = FileHelper.getBufferedWriter(file, 
> encoding);
>         final BufferedReader reader = new BufferedReader(new 
> InputStreamReader(inputStream, encoding));
>         ....
> {code}
> On the other hand you can skip the encoding completely when copying the 
> InputStream. The encoding is used later when the FileResource is read. An 
> alternative and more readable implementation in Java 7 would be:
> {code}
>             tempFile = File.createTempFile("metamodel", ".csv");
>             tempFile.deleteOnExit();
>             Files.copy(resourceAsStream, tempFile.toPath(), 
> StandardCopyOption.REPLACE_EXISTING);
>             return tempfile;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (METAMODEL-1086) Encoding not used with InputStreams in CsvDataContext

Reply via email to