Anson Schwabecher created CSV-226:
-------------------------------------

             Summary: Add CSVParser test case for standard charsets
                 Key: CSV-226
                 URL: https://issues.apache.org/jira/browse/CSV-226
             Project: Commons CSV
          Issue Type: Test
          Components: Parser
    Affects Versions: 1.5
            Reporter: Anson Schwabecher
             Fix For: 1.6


Hello, I'd like to contribute a CSVParser test suite for standard charsets as 
defined in java.nio.charset.StandardCharsets + UTF-32.

This is a standalone test but is also in support of a fix for CSV-107.  It also 
refactors and unifies the testing around your established workaround of 
inserting BOMInputStream ahead of the CSVParser.

It will take a single base UTF-8 encoded file (cstest.csv) and copy it to 
multiple output files (in target dir) with differing character sets, similar to 
the iconv tool.  Each file will then be fed into the parser to test all the 
BOM/NOBOM unicode variants.  I think a file based approach is still important 
here rather than just encoding a character stream inline as a string, that way 
if issues develop it's easy to inspect the data.

I noticed in the project’s pom.xml (rat config) that you are excluding 
individual test resource files by name rather than using a wildcard expression 
to exclude every file in the directory.  Is there a reason for this? It’s much 
better if devs do not have to maintain this configuration.

i.e.: switch over to a single exclude expression:

{{<exclude>src/test/resources/**/*</exclude>}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to