[
https://issues.apache.org/jira/browse/CSV-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206389#comment-17206389
]
Syed Shah commented on CSV-264:
-------------------------------
It looks to me like this is intentional.
https://github.com/apache/commons-csv/blob/master/src/main/java/org/apache/commons/csv/CSVParser.java#L506
{code:java}
// Note: This will always allow a duplicate header if the header is empty
{code}
A possible reason for this could be the following:
||A||B|| || ||C||D||
|1|2| | |3|4|
I don't think it would be user-friendly to make this a failure, but it has
duplicate column names if we consider the empty headers which are left for
formatting. Granted I do think it'd be silly essentially store a bunch of empty
cells as well.
---
I can think of a couple of ways this issue could be solved:
# Simply count empty header names towards duplicates, but documents like above
that use gaps for formatting purposes will have to enable
withAllowDuplicateHeaderNames, I don't think this is ideal.
# Check the entire column when parsing the document, and if it contains values
in a non-header row, then it counts towards the duplicate. This would avoid
empty headers because for formatting, but this sounds like it could get heavy,
though.
# Create a new option which allows empty duplicates, but I don't think this
option makes much sense as it's too similar and conflicts with
withAllowDuplicateHeaderNames.
# Change the withAllowDuplicateHeaderNames boolean to a withDuplicateHeaderRule
enum with the values "ALLOW_ALL_DUPLICATES", "ALLOW_EMPTY_DUPLICATES",
"DISALLOW_DUPLICATES".
I was going to PR #1, but I don't think it's a good solution.
If a maintainer could say what they think of this, I'd gladly try to submit the
patch for it. I'm just unsure on the ideal solution, or if this is considered a
non-issue.
> Duplicate empty header names are allowed even with
> `.withAllowDuplicateHeaderNames(false)`
> ------------------------------------------------------------------------------------------
>
> Key: CSV-264
> URL: https://issues.apache.org/jira/browse/CSV-264
> Project: Commons CSV
> Issue Type: Bug
> Components: Parser
> Affects Versions: 1.8
> Reporter: Sagar Tiwari
> Priority: Major
>
> I'm trying to parse to parse a csv like this:
>
> {{CSVFormat.DEFAULT}}
> {{ .withHeader()}}
> {{ .withAllowDuplicateHeaderNames(false)}}
> {{ .withAllowMissingColumnNames()}}
> {{ .parse(InputStreamReader(FileInputStream(fl)))}}
>
> One would expect this code to throw an error if the following csv is given as
> input:
>
>
> {{"","a",""}}
> {{"1","X","3"}}
> {{"3","Y","4"}}
>
> But it doesn't, and asking for `record.get("")` gives the value from the
> second column. The first column is ignored.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)