[ https://issues.apache.org/jira/browse/METAMODEL-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kasper Sørensen resolved METAMODEL-1086. ---------------------------------------- Resolution: Fixed Assignee: Kasper Sørensen Fix Version/s: 5.0.0 > Encoding not used with InputStreams in CsvDataContext > ----------------------------------------------------- > > Key: METAMODEL-1086 > URL: https://issues.apache.org/jira/browse/METAMODEL-1086 > Project: Apache MetaModel > Issue Type: Bug > Affects Versions: 4.5.2 > Reporter: Samuel Mumm > Assignee: Kasper Sørensen > Fix For: 5.0.0 > > > When using the Constructor with InputStreams you can get into trouble with > encoding if the default encoding of your platform is different than the one > used in the InputStream even though you specify an encoding in the > CvsConfiguration. > {code} > CsvDataContext csvDataContext = new CsvDataContext(someInputstream, new > CsvConfiguration(1, "utf-8", ';', '"', '\\')); > {code} > The offending code is in the static method createFileFromInputStream(): > {code} > private static File createFileFromInputStream(InputStream inputStream, String > encoding) { > .... > final BufferedWriter writer = FileHelper.getBufferedWriter(file, > encoding); > final BufferedReader reader = new BufferedReader(new > InputStreamReader(inputStream)); > .... > {code} > The InputStreamReader is instantiated without a charset. In this case the > Platforms default charset is used (e.g. "windows-1252"). The BufferedWriter > on the other hand is instantiated with the specified charset. This > effectively causes a re-encoding if the file is in a different encoding (e.g. > "utf-8") than the platforms default encoding when the content of the stream > is written to the temp directory. > Instead the code should be similar to this: > {code} > private static File createFileFromInputStream(InputStream inputStream, String > encoding) { > .... > final BufferedWriter writer = FileHelper.getBufferedWriter(file, > encoding); > final BufferedReader reader = new BufferedReader(new > InputStreamReader(inputStream, encoding)); > .... > {code} > On the other hand you can skip the encoding completely when copying the > InputStream. The encoding is used later when the FileResource is read. An > alternative and more readable implementation in Java 7 would be: > {code} > tempFile = File.createTempFile("metamodel", ".csv"); > tempFile.deleteOnExit(); > Files.copy(resourceAsStream, tempFile.toPath(), > StandardCopyOption.REPLACE_EXISTING); > return tempfile; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)