subject:"\[CSV\] Strategies to handle duplicate headers"

Re: [CSV] Strategies to handle duplicate headers

2023-06-21 Thread sebb

On Tue, 20 Jun 2023 at 12:39, Gary Gregory wrote: > > Hi All, > > This thread is a follow-up to > https://github.com/apache/commons-csv/pull/309#issuecomment-1441456258 > > Bruno says: > "With Pandas it automatically deduplicates the column names. Maybe > that's a feature that we could have in

RE: Re: [CSV] Strategies to handle duplicate headers

2023-06-21 Thread Seth Falco

I don't have a strong enough opinion to conclude what's best. Giving it more thought, I think the interface approach I proposed is overcomplicated tbh. I can't imagine needing another duplicate header mode after this. However, I could imagine situations where we define

Re: [CSV] Strategies to handle duplicate headers

2023-06-21 Thread Gary Gregory

Well, maybe we should not have a postfix string method, that assumes a lot. A default implementation of a function to convert all header names sounds better. Gary On Wed, Jun 21, 2023, 09:11 Gary Gregory wrote: > So it is starting to sound like we need either to add to CSVFormat: > > -

Re: [CSV] Strategies to handle duplicate headers

2023-06-21 Thread Gary Gregory

So it is starting to sound like we need either to add to CSVFormat: - "duplicate header postix string", or - deprecate duplicate header mode in favor of a duplicate header strategy which holds a duplicate header mode plus a duplicate header postfix string and some functional interface for custom

Re: [CSV] Strategies to handle duplicate headers

2023-06-21 Thread David Dellsperger

I've always had a big concern with this kind of behavior, because what happens if the "new column" already exists but later in the header? It seems like python/pandas deals with this by incrementing AGAIN, so they read the header and THEN decide what to do with the values for duplicates (make

Re: [CSV] Strategies to handle duplicate headers

2023-06-20 Thread Bruno Kinoshita

Hi, > However, I could imagine situations where we define > DuplicateHeaderMode.DEDUPLICATE, and a user isn't satisfied with our > normalization strategy. For example, dots in the headers breaks ingesting > the data in a third-party system. An interface could resolve this, but I > guess in such

RE: [CSV] Strategies to handle duplicate headers

2023-06-20 Thread Seth Falco

I don't have a strong enough opinion to conclude what's best. Giving it more thought, I think the interface approach I proposed is overcomplicated tbh. I can't imagine needing another duplicate header mode after this. However, I could imagine situations where we define

Re: [CSV] Strategies to handle duplicate headers

2023-06-20 Thread Gary Gregory

That's clever. So we could implement a new enum value DuplicateHeaderMode.DEDUPLICATE... Gary On Tue, Jun 20, 2023, 14:09 Bruno Kinoshita wrote: > Hi, > > Bruno says: > > "With Pandas it automatically deduplicates the column names. Maybe > > that's a feature that we could have in Commons CSV

Re: [CSV] Strategies to handle duplicate headers

2023-06-20 Thread Bruno Kinoshita

Hi, Bruno says: > "With Pandas it automatically deduplicates the column names. Maybe > that's a feature that we could have in Commons CSV too?" > > What does that mean and actually do? Say I have column A with row 1 > value of "X" and 2nd column A with row 1 value of 2. What do I get > when I ask

[CSV] Strategies to handle duplicate headers

2023-06-20 Thread Gary Gregory

Hi All, This thread is a follow-up to https://github.com/apache/commons-csv/pull/309#issuecomment-1441456258 Bruno says: "With Pandas it automatically deduplicates the column names. Maybe that's a feature that we could have in Commons CSV too?" What does that mean and actually do? Say I have

Re: [CSV] Strategies to handle duplicate headers

RE: Re: [CSV] Strategies to handle duplicate headers

Re: [CSV] Strategies to handle duplicate headers

Re: [CSV] Strategies to handle duplicate headers

Re: [CSV] Strategies to handle duplicate headers

Re: [CSV] Strategies to handle duplicate headers

RE: [CSV] Strategies to handle duplicate headers

Re: [CSV] Strategies to handle duplicate headers

Re: [CSV] Strategies to handle duplicate headers

[CSV] Strategies to handle duplicate headers

10 matches

Site Navigation

Mail list logo

Footer information