[ 
https://issues.apache.org/jira/browse/CSV-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845428#comment-16845428
 ] 

Dave Moten commented on CSV-239:
--------------------------------

PR submitted. Here are the PR notes:

* add `getHeaderNames` returns all headers in column order including repeats 
which are allowed in general as per RFC 4180
* add `CSVFormat.withAllowDuplicateHeaderNames()`. `CSVFormat.DEFAULT` now 
allows duplicate header names because RFC 4190 allows non-unique header names. 
This is a behavioural change but not a breaking API change anywhere because 
there is no API contract for it (e.g. javadoc). Because `CSVFormat.DEFAULT` 
should reflect RFC 4190 I'd classify this as a bug fix.
* `CSVFormat` is `Serializable` which means adding new fields to it 
(`allowDuplicateHeaderNames`) is theoretically a breaking change. I propose we 
allow this minor breaking change and also propose that `CSVFormat` does not 
implement `Serializable` in 2.x
* fix `CSVRecord.toMap` javadoc
* fix bug in `CSVParser` where an IAE is thrown with a message about duplicate 
headers when the problem was actually a missing header name
* add test coverage

Question:
* do we need to talk about HeaderNames when we could just say Header?

Not addressed:
* would be nice if `CSVRecord.toMap` returned a Map whose entries are iterable 
in column order but this involves quite a bit of rework so will leave for 
another PR (probably for 2.x).
* `CSVRecord.get(String)` should ideally throw when two columns with that 
header name exist

Notes for 2.x: 
* for consistency `CSVFormat.withAllowMissingColumnNames` should be 
`CSVFormat.withAllowMissingHeaderNames`
* remove `Serializable` from `CSVFormat`
* `CSVFormat.withIgnoreHeaderCase` creates problems and lacks flexibility. I'd 
suggest `CSVRecord.getIgnoreCase(int)` instead

 

> Cannot get headers in column order from CSVRecord
> -------------------------------------------------
>
>                 Key: CSV-239
>                 URL: https://issues.apache.org/jira/browse/CSV-239
>             Project: Commons CSV
>          Issue Type: Improvement
>          Components: Parser
>    Affects Versions: 1.6
>            Reporter: Dave Moten
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have a use case where I read many lines from an arbitrary csv file with a 
> given CSVFormat as List<CSVRecord>, transform that list and then want to 
> write the transformed list to another file. 
> When I specify the format as CSVFormat.DEFAULT.withFirstRecordAsHeader() the 
> headers from the first line are available in the CSVRecord object via the 
> CSVRecord.toMap object but their column positions are not (the iteration of 
> the returned map does not reflect column order). Consequently I cannot write 
> a header line in the correct order to the output csv file (which I do when 
> the first CSVRecord is to be written).
> Another option would be to be to ensure that the CSVPrinter object writes the 
> header on the first call to CSVPrinter.printRecord but we should also be able 
> to cover the user case where we are writing to a non-csv format and we still 
> want to write the headers in the correct order. 
> My preference at minimum is that the headers with column order are available 
> from CSVRecord (after all the data to supply this is already present in 
> CVSRecord). The addition of a method `getHeaders` returning a `List<String>` 
> would do the job. I'm happy to submit a PR if desired.
> I've marked this as of minor importance but I think it's a pretty important 
> flaw in the library at the moment that prevents event the simplest of 
> round-trip (read then write) scenarios when the headers are read from the 
> file rather than known up-front.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to