Re: [CSV] Strategies to handle duplicate headers

2023-06-20 Thread Bruno Kinoshita
Hi, > However, I could imagine situations where we define > DuplicateHeaderMode.DEDUPLICATE, and a user isn't satisfied with our > normalization strategy. For example, dots in the headers breaks ingesting > the data in a third-party system. An interface could resolve this, but I > guess in such

RE: [CSV] Strategies to handle duplicate headers

2023-06-20 Thread Seth Falco
I don't have a strong enough opinion to conclude what's best. Giving it more thought, I think the interface approach I proposed is overcomplicated tbh. I can't imagine needing another duplicate header mode after this. However, I could imagine situations where we define

Re: [CSV] Strategies to handle duplicate headers

2023-06-20 Thread Gary Gregory
That's clever. So we could implement a new enum value DuplicateHeaderMode.DEDUPLICATE... Gary On Tue, Jun 20, 2023, 14:09 Bruno Kinoshita wrote: > Hi, > > Bruno says: > > "With Pandas it automatically deduplicates the column names. Maybe > > that's a feature that we could have in Commons CSV

Re: [CSV] Strategies to handle duplicate headers

2023-06-20 Thread Bruno Kinoshita
Hi, Bruno says: > "With Pandas it automatically deduplicates the column names. Maybe > that's a feature that we could have in Commons CSV too?" > > What does that mean and actually do? Say I have column A with row 1 > value of "X" and 2nd column A with row 1 value of 2. What do I get > when I ask

[CSV] Strategies to handle duplicate headers

2023-06-20 Thread Gary Gregory
Hi All, This thread is a follow-up to https://github.com/apache/commons-csv/pull/309#issuecomment-1441456258 Bruno says: "With Pandas it automatically deduplicates the column names. Maybe that's a feature that we could have in Commons CSV too?" What does that mean and actually do? Say I have