[jira] [Updated] (METAMODEL-1086) Encoding not used with InputStreams in CsvDataContext

Samuel Mumm (JIRA) Fri, 20 May 2016 06:49:31 -0700

     [ 
https://issues.apache.org/jira/browse/METAMODEL-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Samuel Mumm updated METAMODEL-1086:
-----------------------------------
    Description: 
When using the Constructor with InputStreams you can get into trouble with 
encoding if the default encoding of your platform is different than the one 
used in the InputStream even though you specify an encoding in the 
CvsConfiguration.
{code}
CsvDataContext csvDataContext = new CsvDataContext(someInputstream, new 
CsvConfiguration(1, "utf-8", ';', '"', '\\'));
{code}

The offending code is in the static method createFileFromInputStream():
{code}
private static File createFileFromInputStream(InputStream inputStream, String 
encoding) {
        ....
        final BufferedWriter writer = FileHelper.getBufferedWriter(file, 
encoding);
        final BufferedReader reader = new BufferedReader(new 
InputStreamReader(inputStream));
        ....
{code}

The InputStreamReader is instantiated without a charset. In this case the 
Platforms default charset is used (e.g. "windows-1252"). The BufferedWriter on 
the other hand is instantiated with the specified charset. This effectively 
causes a re-encoding if the file is in a different encoding (e.g. "utf-8") than 
the platforms default encoding when the content of the stream is written to the 
temp directory. 

Instead the code should be similar to this: 

{code}
private static File createFileFromInputStream(InputStream inputStream, String 
encoding) {
        ....
        final BufferedWriter writer = FileHelper.getBufferedWriter(file, 
encoding);
        final BufferedReader reader = new BufferedReader(new 
InputStreamReader(inputStream, encoding));
        ....
{code}

On the other hand you can skip the encoding completely when copying the 
InputStream. The encoding is used later when the FileResource is read. An 
alternative and more readable implementation in Java 7 would be:

{code}
            tempFile = File.createTempFile("metamodel", ".csv");
            tempFile.deleteOnExit();
            Files.copy(resourceAsStream, tempFile.toPath(), 
StandardCopyOption.REPLACE_EXISTING);
            return tempfile;
{code}

  was:
When using the Constructor with InputStreams you can get into trouble with 
encoding if the default encoding of your platform is different than the one 
used in the InputStream even though you specify an encoding in the 
CvsConfiguration.
{code}
CsvDataContext csvDataContext = new CsvDataContext(someInputstream, new 
CsvConfiguration(1, "utf-8", ';', '"', '\\'));
{code}

The offending code is in the static method createFileFromInputStream():
{code}
private static File createFileFromInputStream(InputStream inputStream, String 
encoding) {
        ....
        final BufferedWriter writer = FileHelper.getBufferedWriter(file, 
encoding);
        final BufferedReader reader = new BufferedReader(new 
InputStreamReader(inputStream));
        ....
{code}

The InputStreamReader is instantiated without a charset. In this case the 
Platforms default charset is used (e.g. "windows-1252"). The BufferedWriter on 
the other hand is instantiated with the specified charset. This effectively 
causes a re-encoding if the file is in a different encoding (e.g. "utf-8") than 
the platforms default encoding when the content of the stream is written to the 
temp directory. 

Instead the code should be similar to this: 

{code}
private static File createFileFromInputStream(InputStream inputStream, String 
encoding) {
        ....
        final BufferedWriter writer = FileHelper.getBufferedWriter(file, 
encoding);
        final BufferedReader reader = new BufferedReader(new 
InputStreamReader(inputStream, encoding));
        ....
{code}


> Encoding not used with InputStreams in CsvDataContext
> -----------------------------------------------------
>
>                 Key: METAMODEL-1086
>                 URL: https://issues.apache.org/jira/browse/METAMODEL-1086
>             Project: Apache MetaModel
>          Issue Type: Bug
>    Affects Versions: 4.5.2
>            Reporter: Samuel Mumm
>
> When using the Constructor with InputStreams you can get into trouble with 
> encoding if the default encoding of your platform is different than the one 
> used in the InputStream even though you specify an encoding in the 
> CvsConfiguration.
> {code}
> CsvDataContext csvDataContext = new CsvDataContext(someInputstream, new 
> CsvConfiguration(1, "utf-8", ';', '"', '\\'));
> {code}
> The offending code is in the static method createFileFromInputStream():
> {code}
> private static File createFileFromInputStream(InputStream inputStream, String 
> encoding) {
>         ....
>         final BufferedWriter writer = FileHelper.getBufferedWriter(file, 
> encoding);
>         final BufferedReader reader = new BufferedReader(new 
> InputStreamReader(inputStream));
>         ....
> {code}
> The InputStreamReader is instantiated without a charset. In this case the 
> Platforms default charset is used (e.g. "windows-1252"). The BufferedWriter 
> on the other hand is instantiated with the specified charset. This 
> effectively causes a re-encoding if the file is in a different encoding (e.g. 
> "utf-8") than the platforms default encoding when the content of the stream 
> is written to the temp directory. 
> Instead the code should be similar to this: 
> {code}
> private static File createFileFromInputStream(InputStream inputStream, String 
> encoding) {
>         ....
>         final BufferedWriter writer = FileHelper.getBufferedWriter(file, 
> encoding);
>         final BufferedReader reader = new BufferedReader(new 
> InputStreamReader(inputStream, encoding));
>         ....
> {code}
> On the other hand you can skip the encoding completely when copying the 
> InputStream. The encoding is used later when the FileResource is read. An 
> alternative and more readable implementation in Java 7 would be:
> {code}
>             tempFile = File.createTempFile("metamodel", ".csv");
>             tempFile.deleteOnExit();
>             Files.copy(resourceAsStream, tempFile.toPath(), 
> StandardCopyOption.REPLACE_EXISTING);
>             return tempfile;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (METAMODEL-1086) Encoding not used with InputStreams in CsvDataContext

Reply via email to