On Tue, 20 Jun 2023 at 12:39, Gary Gregory wrote:
>
> Hi All,
>
> This thread is a follow-up to
> https://github.com/apache/commons-csv/pull/309#issuecomment-1441456258
>
> Bruno says:
> "With Pandas it automatically deduplicates the column names. Maybe
> that's a feature that we could have in
I don't have a strong enough opinion to conclude what's best.
Giving it more thought, I think the interface approach I proposed is
overcomplicated tbh. I can't imagine needing another duplicate header
mode after this.
However, I could imagine situations where we define
Well, maybe we should not have a postfix string method, that assumes a lot.
A default implementation of a function to convert all header names sounds
better.
Gary
On Wed, Jun 21, 2023, 09:11 Gary Gregory wrote:
> So it is starting to sound like we need either to add to CSVFormat:
>
> -
So it is starting to sound like we need either to add to CSVFormat:
- "duplicate header postix string", or
- deprecate duplicate header mode in favor of a duplicate header strategy
which holds a duplicate header mode plus a duplicate header postfix string
and some functional interface for custom
I've always had a big concern with this kind of behavior, because what
happens if the "new column" already exists but later in the header? It
seems like python/pandas deals with this by incrementing AGAIN, so they
read the header and THEN decide what to do with the values for duplicates
(make
Hi,
> However, I could imagine situations where we define
> DuplicateHeaderMode.DEDUPLICATE, and a user isn't satisfied with our
> normalization strategy. For example, dots in the headers breaks ingesting
> the data in a third-party system. An interface could resolve this, but I
> guess in such
I don't have a strong enough opinion to conclude what's best.
Giving it more thought, I think the interface approach I proposed is
overcomplicated tbh. I can't imagine needing another duplicate header
mode after this.
However, I could imagine situations where we define
That's clever. So we could implement a new enum value
DuplicateHeaderMode.DEDUPLICATE...
Gary
On Tue, Jun 20, 2023, 14:09 Bruno Kinoshita wrote:
> Hi,
>
> Bruno says:
> > "With Pandas it automatically deduplicates the column names. Maybe
> > that's a feature that we could have in Commons CSV
Hi,
Bruno says:
> "With Pandas it automatically deduplicates the column names. Maybe
> that's a feature that we could have in Commons CSV too?"
>
> What does that mean and actually do? Say I have column A with row 1
> value of "X" and 2nd column A with row 1 value of 2. What do I get
> when I ask
Hi All,
This thread is a follow-up to
https://github.com/apache/commons-csv/pull/309#issuecomment-1441456258
Bruno says:
"With Pandas it automatically deduplicates the column names. Maybe
that's a feature that we could have in Commons CSV too?"
What does that mean and actually do? Say I have
10 matches
Mail list logo