[ 
https://issues.apache.org/jira/browse/CSV-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937379#comment-13937379
 ] 

Kenzley Alphonse commented on CSV-107:
--------------------------------------

I would beg to differ. This will be problematic when trying to read from a file 
with a BOM character and without wrapping it with BOMInputStream. Your user 
will not know why it failed and this will lead to confusion.

Example code:
Reader in = new FileReader("<path to>\\vod.csv");
// in.read(new char[1]); // works if we skip the BOM character, fails otherwise.
Iterable<CSVRecord> records = CSVFormat.EXCEL.withHeader().parse(in);
for (CSVRecord record: records) {
        System.out.println("date: " + record.get("Date"));
}

A lot of other libraries account for BOM characters naively, namely:

https://code.google.com/p/google-gson/source/browse/trunk/gson/src/main/java/com/google/gson/stream/JsonReader.java#1266
http://grepcode.com/file/repo1.maven.org/maven2/org.glassfish/javax.json/1.0/org/glassfish/json/UnicodeDetectingInputStream.java#128

Simply, detecting the BOM character before you begin parsing and skipping it if 
present is sufficient. Plus, it saves debugging and headaches for your 
developers.

> CSVFormat.EXCEL.parse should handle byte order marks
> ----------------------------------------------------
>
>                 Key: CSV-107
>                 URL: https://issues.apache.org/jira/browse/CSV-107
>             Project: Commons CSV
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: 1.0
>            Reporter: Kenzley Alphonse
>            Priority: Critical
>         Attachments: vod.csv
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> The CSVFormat.EXCEL.parse should consider the byte order marks when reading 
> the input stream. Files with a byte order mark fail to properly parse.
> In my example, I have a starting byte order mark before my headers in a CVS 
> file. The parse fails when trying to get the header via the CSVRecord.get 
> call.
> I marked this as critical because many users will interact with Windows user 
> which will most likely have BOM files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to